_______ __ _______
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----.
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --|
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____|
on Gopher (inofficial)
(HTM) Visit Hacker News on the Web
COMMENT PAGE FOR:
(HTM) The highest quality codebase
keeda wrote 1 hour 43 min ago:
Hilarious! Kinda reinforces the idea that LLMs are like junior
engineers with infinite energy.
But just telling an AI it's a principal engineer does not make it a
principal engineer. Firstly, that is such a broad, vaguely defined
term, and secondly, typically that level of engineering involves
dealing with organizational and industry issues rather than just
technical ones.
And so absent a clear definition, it will settle on the lowest common
denominator of code quality, which would be test coverage -- likely
because that is the most common topic in its training data -- and
extrapolate from that.
The other thing is, of course, the RL'd sycophancy which compels it to
do something, anything, to obey the prompt. I wonder what would happen
if tweaked the prompt just a little bit to say something like "Use your
best judgement and feel free to change nothing."
mgrat wrote 2 hours 7 min ago:
I come here asking with the greatest humility and from the perspective
of a SRE in a highly regulated industry. Where does AI have anything
more to offer than pumping out TS crud apps? Are any of the people
doing this accountable to the tech debt this creates?
credit_guy wrote 1 hour 53 min ago:
I see this sentiment quite often. The Economist chose the "word of
the year"; it is "slop". Everybody hates AI slop.
And lots of people who use AI coding assistants go through a phase of
pushing AI slop in prod. I know I did that. Some of it still bites me
to this day.
But here's the thing: AI coding assistants did not exist two years
ago. We are critical of them based on unfounded expectations. They
are tools, and they have limitations. They are far, very, very far,
from being perfect. They will not replace us for 20 years, at least.
But are they useful? Yes. Can you learn usage patterns so you
eliminate as much as possible AI slop? I personally hope I did that;
I think quite a lot of people who use AI coding assistants have found
ways to tame the beast.
culi wrote 2 hours 45 min ago:
I checked the diffs of the `highest-quality` branch vs `main` and
immediately noticed an `as any` [1] Not what I would expect from a
prompt like "you're a principal engineer"
(HTM) [1]: https://github.com/Gricha/macro-photo/compare/main...highest-q...
chr15m wrote 2 hours 50 min ago:
It behaved exactly like 99% of developers, introducing unnecessary
complexity.
whalesalad wrote 5 hours 0 min ago:
I would love to see an experiment done like this with an arena of
principal engineer agents. Give each of them a unique personality: this
one likes shiny new objects and is willing to deal with early adopter
pain, this one is a neckbeard who uses emacs as pid 1 and sends email
via usb thumbdrive, and the third is a pragmatic middle of the road
person who can help be the glue between them. All decisions need to
reach a quorum before continuing. Better yet: each agent is running on
a completely different model from a different provider. 3 can be a knob
you dial up to 5, 10, etc. Each of these agents can spawn sub-agents,
to reach out to professionals like a CSS export, or a DBA.
I think prompt engineering could help here a bit, adding some context
on what a quality codebase is, remove everything that is not necessary,
consider future maintainability (20->84k lines is a smell). All of
these are smells that like a simple supervisor agent could have caught.
failuremode wrote 5 hours 28 min ago:
> We went from around 700 to a whooping 5369 tests
> Tons of tests got added, but some tests that mattered the most
(maestro e2e tests that validated the app still works) were forgotten.
I've seen many LLM proponents often cite the number of tests as a
positive signal.
This smells, to me, like people who tout lines of code.
When you are counting tests in the thousands I think its a negative
signal.
You should be writing property based tests rather than 'assert x=1',
'assert x=2', 'assert x=-1' and on and on.
If LLMs are incapable of acknowledging that then add it to the long
list of 'failure modes'.
layer8 wrote 6 hours 18 min ago:
This makes me wonder what the result would be of having an AI turn a
code base into literate-programming style, and have it iterate on that
to improve the âliteracyâ.
thomassmith65 wrote 7 hours 3 min ago:
With a good programmer, if they do multiple passes of a refactor, each
pass makes the code more elegant, and the next pass easier to
understand and further improve.
Claude has a bias to add lines of code to a project, rather than make
it more concise. Consequently, each refactoring pass becomes more
difficult to untangle, and harder to improve.
Ideally, in this experiment, only the first few passes would result in
changes - mostly shrinking the project size, and from then on, Claude
would change nothing - just a like a very good programmer.
This is the biggest problem with developing with Claude, by far.
Anthropic should laser focus on fixing it.
blobbers wrote 7 hours 44 min ago:
I'm curious if anyone has written a "Principal Engineer" agents.md or
CLAUDE.md style file that yields better results than the 'junior dev'
results people are seeing here.
I've worked on writing some as a data scientist, and I have gotten the
basic claude output to be much better; it makes some saner decisions,
it validates and circles back to fix fits, etc.
Bombthecat wrote 7 hours 55 min ago:
Story of AI:
For instance - it created a hasMinimalEntropy function meant to "detect
obviously fake keys with low character variety". I don't know why.
barbazoo wrote 8 hours 29 min ago:
> I can sort of respect that the dependency list is pretty small, but
at the cost of very unmaintainable 20k+ lines of utilities. I guess it
really wanted to avoid supply-chain attacks.
> Some of them are really unnecessary and could be replaced with off
the shelf solution
Lots of people would regard this as a good thing. Surely the LLM can't
guess which kind you are.
jcalvinowens wrote 8 hours 53 min ago:
This really mirrors my experience trying to get LLMs to clean up kernel
driver code, they seem utterly incapable of simplifying things.
lubesGordi wrote 8 hours 56 min ago:
So now you know. You can get claude to write you a ton of unit tests
and also improve your static typing situation. Now you can restrict
your prompt!
smallpipe wrote 9 hours 0 min ago:
The viewport of this website is quite infuriating. I have to scroll
horizontally to see the `cloc` output, but there's 3x the empty space
on either side.
ttul wrote 9 hours 3 min ago:
Have you tried writing into the AGENTS.md something like, "Always be on
the lookout for dead code, copy-pasta, and other opportunities to
optimize and trim the codebase in a sensible way."
In my experience, adding this kind of instruction to the context window
causes SOTA coding models to actually undertake that kind of
optimization while development carries on. You can also periodically
chuck your entire codebase into Gemini-3 (with its massive context
window) and ask it to write a refactoring plan; then, pass that
refactoring plan back into your day-to-day coding environment such as
Cursor or Codex and get it to take a few turns working away at the
plan.
As with human coders, if you let them run wild "improving" things
without specifically instructing them to also pay attention to bloat,
bloat is precisely what you will get.
nadis wrote 9 hours 6 min ago:
20K --> 84K lines of ts for a simple app is bananas. Much madness
indeed! But also super interesting, thanks for sharing the experiment.
jedberg wrote 9 hours 17 min ago:
You know how when someone hears how many engineerings are working on a
product, and you think to yourself, "but I could do that with like
three people!"? Now you know why they have so many people. Because
they did this with their codebase, but with humans.
Or I should say, they kept hiring the humans who needed something to
do, and basically did what this AI did.
jesse__ wrote 9 hours 22 min ago:
> This app is around 4-5 screens. The version "pre improving quality"
was already pretty large. We are talking around 20k lines of TS
Fucking yikes dude. When's the last time it took you 4500 lines per
screen, 9000 including the JSON data in the repo????? This is already
absolute insanity.
I bet I could do this entire app in easily less than half, probably
less than a tenth, of that.
maerF0x0 wrote 9 hours 29 min ago:
I would love to see someone do a longitudinal study of the
incident/error rate of a canary container in prod that is managed by
claude. Basically doing a control/experimental group to prove who does
better the Humans or the AI?
minimaxir wrote 9 hours 36 min ago:
About a year ago I wrote a blog post (HN discussion: [1] )
experimenting if asking Claude to "write code better" repeatedly would
indeed cause it to write better code, determined by speed as better
code implies more efficient algorithms. I found that it did indeed work
(at n=5 iterations), but additionally providing a system prompt also
explicitly improved it.
Given with what I've seen from Claude 4.5 Opus, I suspect the following
test would be interesting: attempt to have Claude Code +
Haiku/Sonnet/Opus implement and benchmark an algorithm with:
- no CLAUDE.md file
- a basic CLAUDE.md file
- an overly nuanced CLAUDE.md file
And then both test the algorithm speed and number of turns it takes to
hit that algorithm speed.
(HTM) [1]: https://news.ycombinator.com/item?id=42584400
thald wrote 9 hours 37 min ago:
Interesting experiment. Looking at this I immediately thought similar
experiment run by Google: AlphaEvolve. Throwing LLM compute at problems
might work if the problem is well defined and the result can be
objectively measured.
As for this experiment:
What does quality even mean? Most human devs will have different
opinions on it. If you would ask 200 different devs (Claude starts from
0 after each iteration) to do the same, I have doubts the code would
look much better.
I am also wondering what would happen if Claude would have an option to
just walk away from the code if its "good enough". For each problem
most human devs run cost->benefit equation in their head, only worthy
ideas are realized. Claude does not do it, the code writing cost is
very low on his site and the prompt does not allow any graceful exit :)
samuelknight wrote 9 hours 40 min ago:
This is an interesting experiment that we can summarize as "I gave a
smart model a bad objective", with the key result at the end
"...oh and the app still works, there's no new features, and just a few
new bugs."
Nobody thinks that doing 200 improvement passes on functioning code
base is a good idea. The prompt tells the model that it is a principal
engineer, then contradicts that role the imperative "We need to improve
the quality of this codebase". Determining when code needs to be
improved is a responsibility for the principal engineer but the prompt
doesn't tell the model that it can decide the code is good enough. I
think we would see a different behavior if the prompt was changed to
"Inspect the codebase, determine if we can do anything to improve code
quality, then immediately implement it." If the model is smart enough,
this will increasingly result in passes where the agent decides there
is nothing left to do.
In my experience with CC I get great results where I make an open ended
question about a large module and instruct it to come back to me with
suggestions. Claude generates 5-10 suggestions and ranks them by
impact. It's very low-effort from the developer's perspective and it
can generate some good ideas.
fauigerzigerk wrote 9 hours 41 min ago:
What would happen if you gave the same task to 200 human contractors?
I suspect SLOC growth wouldn't be quite as dramatic but things like
converting everything to Rust's error handling approach could easily
happen.
tracker1 wrote 9 hours 42 min ago:
On the Result responses... I've seen this a few times. I think it
works well in Rust or other languages that don't have the ability to
"throw" baked in. However, when you bolt it on to a language that
implicitly can throw, you're now doing twice the work as you have to
handle the explicit error result and integrated errors.
I worked in a C# codebase with Result responses all over the place, and
it just really complicated every use case all around. Combined with
Promises (TS) it's worse still.
mrsmrtss wrote 8 hours 21 min ago:
The Result pattern also works exceptionally well with C#, provided
you ensure that code returning a Result object never throws an
exception. Of course, there are still some exceptional things that
can throw, but this is essentially the same situation as dealing with
Rust panics.
tracker1 wrote 4 hours 29 min ago:
IMO, Rust panics should kill the application... C# errors
shouldn't. Also, in practice, in C# where I was dealing with
Result, there was just as much chance of seeing an actual thrown
error, so you always had to deal with both an explicit error result
AND thrown errors in practice... it was worse than just error
patterns with type specific catch blocks.
g947o wrote 9 hours 44 min ago:
When I ask coding agents to add tests, they often come up with
something like this:
const x = new NewClass();
assert.ok(x instanceof NewClass);
So I am not at all surprised about Claude adding 5x tests, most of
which are useless.
It's going to be fun to look back at this and see how much slop these
coding agents created.
orliesaurus wrote 9 hours 49 min ago:
Ok SRS question:
What's the best "Code Review" Skill/Agent/Prompt that I can use these
days? Curious to see even paid options if anyone knows?
GuB-42 wrote 9 hours 52 min ago:
It is something I noticed when talking to LLMs, if they don't get it
right the first time, they probably never will, and if you really
insist, the quality starts to degrade.
It is not unlike people, the difference being that if you ask someone
the same thing 200 times, he will probably going to tell you to go fuck
yourself, or, if unable to, turn to malicious compliance. These AIs
will always be diligent. Or, a human may use the opportunity to educate
himself, but again, LLMs don't learn by doing, they have a distinct
training phase that involves ingesting pretty much everything humanity
has produced, your little conversation will not have a significant
effect, if at all.
grvdrm wrote 9 hours 32 min ago:
I use a new chat/etc every time that happens. Try to improve my
prompt to get a better result. Sometimes works, but that multiple
chat rather than laborious long chat approach annoys me less.
VikingCoder wrote 10 hours 0 min ago:
You need to scroll the windows to see all the numbers. (Why??)
Havoc wrote 10 hours 2 min ago:
My current fav improvement strategy is
1) Run multiple code analysis tools over it and have the LLM aggregate
it with suggestions
2) ask the LLM to list potential improvements open ended question and
pick by hand which I want
And usually repeat the process with a completely different model (ie
diff company trained it)
Any more and yeah they end up going in circles
keepamovin wrote 10 hours 4 min ago:
This is actually a great idea. It's like those AI resampled this image
10,000 times. Or JPEG iteratively compressed this picture 1 Million
times.
mvanbaak wrote 10 hours 4 min ago:
`--dangerously-skip-permissions` why?
minimaxir wrote 10 hours 1 min ago:
It's necessary to allow Claude Code to be fully autonomous, otherwise
it will stop and ask you to run commands.
mvanbaak wrote 9 hours 54 min ago:
and just letting it to do whatever it thinks it should do, without
a human intervening, is a good plan?
ssl-3 wrote 9 hours 33 min ago:
Depending on the breadth (and value) of the sandbox: Sure? Why
not?
To extend what may seem like a [prima facie] insane, stupid, or
foolhardy idea: Why not send the output of /dev/urandom into
/bin/bash? Or even /proc/mem? It probably won't do anything
particularly interesting. It will probably just break things and
burn power.
And so? It's just a computer; its scope is limited.
news_hacker wrote 9 hours 37 min ago:
the "best practice" suggestion would be to do this in a sandboxed
container
minimaxir wrote 9 hours 44 min ago:
Discovering that is the entire intent of this experiment, yes.
mvanbaak wrote 9 hours 41 min ago:
fair point. will re-read the whole thing. I'm sorry for my
ignorance.
guluarte wrote 10 hours 9 min ago:
that's my experience with AI, most times it creates an overengineered
solution unless told it to keep it simple
mbesto wrote 10 hours 11 min ago:
While there are justifiable comments here about how LLMs behave, I want
to point out something else:
There is no consensus on what constitutes a high quality codebase.
Said differently - even if you asked 200 humans to do this same
exercise, you would get 200 different outputs.
phildougherty wrote 10 hours 11 min ago:
Pasting this whole article in to claude code "improve my codebase
taking this article in to account"
minimaxir wrote 10 hours 0 min ago:
You can just give Claude Code/any modern Agent a URL and it'll
retrieve it.
torginus wrote 10 hours 12 min ago:
I've heard a very apt criticism of the current batch of LLMs:
LLMs are incapable of reducing entropy in a code base
I've always had this nagging feeling, but I think this really captures
the essence of it succintly.
surprisetalk wrote 10 hours 13 min ago:
This reflects my experience with human programmers. So many devs are
taught to add layers of complexity in pursuit of "best practices". I
think the LLM was trained to behave this way.
In my experience, Claude can actually clean up a repo rather nicely if
you ask it to (1) shrink source code size (LOC or total bytes), (2)
reduce dependencies, and (3) maintain integration tests.
6LLvveMx2koXfwn wrote 10 hours 13 min ago:
for all the bad code havoc was most certainly not 'wrecked', it may
have been 'wreaked' though . . .
elzbardico wrote 10 hours 13 min ago:
LLMs have this strong bias towards generating code, because writing
code is the default behavior from pre-training.
Removing code, renaming files, condensing, and other edits is mostly a
post-training stuff, supervised learning behavior. You have armies of
developers across the world making 17 to 35 dollars an hour solving
tasks step by step which are then basically used to generate
prompt/responses pairs of desired behavior for a lot of common
development situations, adding desired output for things like tool
calling, which is needed for things like deleting code.
A typical human working on post-training dataset generation task would
involve a scenario like: given this Dockerfile for a python
application, when we try to run pytest it fails with exception foo not
found. The human will notice that package foo is not installed, change
the requirements.txt file and write this down, then he will try pip
install, and notice that the foo package requires a certain native
library to be installed. The final output of this will be a response
with the appropriate tool calls in a structured format.
Given that the amount of unsupervised learning is way bigger than the
amount spent on fine-tuning for most models, it is not surprise that
given any ambiguous situation, the model will default to what it knows
best.
More post-training will usually improve this, but the quality of the
human generated dataset probably will be the upper bound of the output
quality, not to mention the risk of overfitting if the foundation model
labs embrace SFT too enthusiastically.
hackernewds wrote 8 hours 32 min ago:
> Writing code is the default behavior from pre-training
what does this even mean? could you expand on it
joaogui1 wrote 2 hours 7 min ago:
During pre-training the model is learning next-token prediction,
which is naturally additive. Even if you added DEL as a token it
would still be quite hard to change the data so that it can be used
in a mext-token prediction task
Hope that helps
bongodongobob wrote 7 hours 53 min ago:
He means that it is heavily biased to write code, not remove,
condense, refactor, etc. It wants to generate more stuff, not less.
elzbardico wrote 46 min ago:
Because there are not a lot of high quality examples of code
edition on the training corpora other than maybe version control
diffs.
Because editing/removing code requires that the model output
tokens for tools calls to be intercepted by the coding agent.
Responses like the example below are not emergent behavior, they
REQUIRE fine-tuning. Period.
I need to fix this null pointer issue in the auth module.
<|tool_call|>
{"id": "call_abc123", "type": "function", "function": {"name":
"edit_file", "arguments": "{"path": "src/auth.py",
"start_line": 12, "end_line": 14, "replacement": "def
authenticate(user):\n if user is None:\n return
False\n return verify(user.token)"}"}}
<|end_tool_call|>
snet0 wrote 6 hours 29 min ago:
I don't see why this would be the case.
bunderbunder wrote 4 hours 25 min ago:
Itâs because thatâs what most resembles the bulk of the
tasks it was being optimized for during pre-training.
etamponi wrote 10 hours 15 min ago:
Am I the only one that is surprised that the app still works?!
stavros wrote 10 hours 18 min ago:
Well, given it can't say "no, I think it's good enough now", you'll
just get madness, no?
minimaxir wrote 9 hours 45 min ago:
That's the point. Sometimes madness is interesting.
WhitneyLand wrote 10 hours 20 min ago:
It can be difficult to explain to management why in certain scenarios
AI can seem to work coding miracles, but this still doesnât mean
itâs going always speed up development 10x especially for an
established code base.
Tangible examples like this seem like a useful way to show some of the
limitations.
bulletsvshumans wrote 10 hours 22 min ago:
I think the prompt is a major source of the issue. "We need to improve
the quality of this codebase" implicitly indicates that there is
something wrong with the codebase. I would be curious to see if it
would reach a point of convergence with a prompt that allowed for it.
Something like "Improve the quality of this codebase, or tell me that
it is already in an optimal state."
iambateman wrote 10 hours 27 min ago:
The point heâs making - that LLMâs arenât ready for broadly
unsupervised software development - is well made.
It still requires an exhausting amount of thought and energy to make
the LLM go in the direction I want, which is to say in a direction
which considers the code which is outside the current context window.
I suspect that we will not solve the context window problem for a long
time. But we will see a tremendous growth in âon demand toolingâ
for things which do fit into a context window and for which we can let
the AI âdo whatever it wants.â
For me, my work product needs to conform to existing design standards
and I canât figure out how to get Claude to not just wire up its own
button styles.
But itâs remarkable howâdespite all of the nonsenseâthese tools
remain an irreplaceable part of my work life.
spaceywilly wrote 10 hours 1 min ago:
I feel like Iâve figured out a good workflow with AI coding tools
now. I use it in âPlanning modeâ to describe the feature or
whatever I am working on and break it down into phases. I iterate on
the planning doc until it matches what I want to build.
Then, I ask it to execute each phase from the doc one at a time. I
review all the code it writes or sometimes just write it myself. When
it is done it updates the plan with what was accomplished and what
needs to be done next.
This has worked for me because:
- it forces the planning part to happen before coding. A lot of
Claudeâs âwtfâ moments can be caught in this phase before it
write a ton of gobbledygook code that I then have to clean up
- the code is written in small chunks, usually one or two functions
at a time. Itâs small enough that I can review all the code and
understand before I click accept. Thereâs no blindly accepting junk
code.
- the only context is the planning doc. Claude captures everything it
needs there, and itâs able to pick right up from a new chat and
keep working.
- it helps my distraction-prone brain make plans and keep track of
what I was doing. Even without Claude writing any code, this alone is
a huge productivity boost for me. Itâs like have a magic notebook
that keeps track of where I was in my projects so I can pick them up
again easily.
torginus wrote 10 hours 3 min ago:
Which is why I think agentic software development is not really worth
it today. It can solve well-defined problems, and work through issues
by rote, but to give it some task and have it work on it for a couple
hours, then you have to come in and fix it up.
I think LLMs are still at the 'advanced autocomplete' stage, where
the most productive way to use them is to have a human in the loop.
In this, accuracy of following instructions, and short feedback time
is much more important than semi-decent behavior over long-horizon
tasks.
krupan wrote 10 hours 32 min ago:
Just the headline sounds like a YouTube brain rot video title:
"I spent 200 days in the woods"
"I Google translated this 200 times"
"I hit myself with this golf club 200 times"
Is this really what hacker news is for now?
havkom wrote 10 hours 28 min ago:
There are fundamental differences. Many people expect a positive
gradient of quality from AI overhaul of projects. For translating
back and forth, it is obvious from the outset that there is a
negative gradient of quality (the Chinese whispers game).
jmkni wrote 10 hours 30 min ago:
If you reverse the order this could be a very interesting Youtube
series
hazmazlaz wrote 10 hours 34 min ago:
Well of course it produced bad results... it was given a bad prompt.
Imagine how things would have turned out if you had given the same
instructions to a skilled but naive contractor who contractually
couldn't say no and couldn't question you. Probably pretty similar.
mainmailman wrote 10 hours 31 min ago:
Yeah I don't see the utility in doing this hundreds of times back to
back. A few iterations can tell us some things about how Claude
optimizes code, but an open ended prompt to endlessly "improve" the
code sounds like a bad boss making huge demands. I don't blame the AI
for adding BS down the line.
simonw wrote 10 hours 35 min ago:
The prompt was:
Ultrathink. You're a principal engineer. Do not ask me any
questions. We need to improve the quality of this codebase.
Implement improvements to codebase quality.
I'm a little disappointed that Claude didn't eventually decide to start
removing all of the cruft it had added to improve the quality that way
instead.
Gricha wrote 9 hours 33 min ago:
Yeah, the best it did on some iterations is claimed that the codebase
was already in the good state and didn't produce changes - but that
was 1 in many.
gm678 wrote 10 hours 37 min ago:
"Core Functional Utilities: Identity function - returns its input
unchanged." is one of my favorites from `lib/functional.ts`.
bikeshaving wrote 10 hours 39 min ago:
[1] The logger library which Claude created is actually pretty simple,
highly approachable code, with utilities for logging the timings of
async code and the ability to emit automatic performance warnings.
I have been using LogTape ( [2] ) for JavaScript logging, and the
inherited, category-focused logging with different sinks has been
pretty great.
(HTM) [1]: https://github.com/Gricha/macro-photo/blob/highest-quality/lib...
(HTM) [2]: https://logtape.org
elzbardico wrote 10 hours 39 min ago:
Funniest part:
> ..oh and the app still works, there's no new features, and just a few
new bugs.
Hammershaft wrote 10 hours 44 min ago:
Impressive that the app still works! Did not expect that.
dcchuck wrote 10 hours 51 min ago:
I spent some time last night "over iterating" on a plan to do some
refactoring in a large codebase.
I created the original plan with a very specific ask - create an
abstraction to remove some tight coupling. Small problem that had a big
surface area. The planning/brainstorming was great and I like the plan
we came up with.
I then tried to use a prompt like OP's to improve it (as I said, large
surface area so I wanted to review it) - "Please review PLAN_DOC.md -
is it a comprehensive plan for this project?". I'd run it -> get
feedback -> give it back to Claude to improve the plan.
I (naively perhaps) expected this process to converge to a "perfect
plan". At this point I think of it more like a probability tree where
there's a chance of improving the plan, but a non-zero chance of
getting off the rails. And once you go off the rails, you only veer
further and further from the truth.
There are certainly problems where "throwing compute" at it and
continuing to iterate with an LLM will work great. I would expect those
to have firm success criteria. Providing definitions of quality would
significantly improve the output here as well (or decrease the
probability of going off the rails I suppose). Otherwise Claude will
confuse quality like we see here.
Shout out OP for sharing their work and moving us forward.
Gricha wrote 9 hours 42 min ago:
I think I end up doing that with plans inadvertently too. Oftentimes
I'll iterate on a plan too many times, and only recognize that it's
too far gone and needs a restart with more direction after sinking in
15 minutes into it.
elzbardico wrote 10 hours 33 min ago:
Small errors compound over time.
pawelduda wrote 10 hours 51 min ago:
Did it create 200 CODE_QUALITY_IMPROVEMENTS.md files by chance?
postalcoder wrote 10 hours 51 min ago:
One of my favorite personal evals for llms is testing its stability as
a reviewer.
The basic gist of it is to give the llm some code to review and have it
assign a grade multiple times. How much variance is there in the grade?
Then, prompt the same llm to be a "critical" reviewer with the same
code multiple times. How much does that average critical grade change?
A low variance of grades across many generations and a low delta
between "review this code" and "review this code with a critical eye"
is a major positive signal for quality.
I've found that gpt-5.1 produces remarkably stable evaluations whereas
Claude is all over the place. Furthermore, Claude will completely [and
comically] change the tenor of its evaluation when asked to be critical
whereas gpt-5.1 is directionally the same while tightening the screws.
You could also interpret these results to be a proxy for
obsequiousness.
Edit: One major part of the eval i left out is "can an llm converge on
an 'A'?" Let's say the llm gives the code a 6/10 (or B-). When you
implement its suggestions and then provide the improved code in a new
context, does the grade go up? Furthermore, can it eventually give
itself an A, and consistently?
It's honestly impressive how good, stable, and convergent gpt-5.1 is.
Claude is not great. I have yet to test it on Gemini 3.
lemming wrote 8 hours 38 min ago:
I agree, I mostly use Claude for writing code, but I always get GPT5
to review it. Like you, I find it astonishingly consistent and
useful, especially compared to Claude. I like to reset my context
frequently, so Iâll often paste the problems from GPT into Claude,
then get it to review those fixes (going around that loop a few
times), then reset the context and get it to do a new full review.
Itâs very reassuring how consistent the results are.
OsrsNeedsf2P wrote 9 hours 23 min ago:
How is this different than testing the temperature?
itishappy wrote 4 hours 7 min ago:
How does temperature explain the variance in response to the
inclusion of the word "critical"?
smt88 wrote 8 hours 32 min ago:
It isn't, and it reflects how deeply LLMs are misunderstood, even
by technical people
adastra22 wrote 9 hours 55 min ago:
You mean literally assign a grade, like B+? This is unlikely to work
based on how token prediction & temperature works. You're going to
get a probability distribution in the end that is reflective of the
model runtime parameters, not the intelligence of the model.
guluarte wrote 10 hours 7 min ago:
my experience reviewing pr is that sometimes it says it is perfect
with some nipicks and other times the same pr that it is trash and
need a lot of work
xnorswap wrote 10 hours 57 min ago:
Claude is really good at specific analysis, but really terrible at
open-ended problems.
"Hey claude, I get this error message: ", and it'll often find the root
cause quicker than I could.
"Hey claude, anything I could do to improve Y?", and it'll struggle
beyond the basics that a linter might suggest.
It suggested enthusiastically a library for and it was all
"Recommended" about it, but when I pointed out that the library had
been considered and rejected because , it understood and wrote up why
that library suffered from that issue and why it was therefore
unsuitable.
There's a significant blind-spot in current LLMs related to blue-sky
thinking and creative problem solving. It can do structured problems
very well, and it can transform unstructured data very well, but it
can't deal with unstructured problems very well.
That may well change, so I don't want to embed that thought too deeply
into my own priors, because the LLM space seems to evolve rapidly. I
wouldn't want to find myself blind to the progress because I write it
off from a class of problems.
But right now, the best way to help an LLM is have a deep understanding
of the problem domain yourself, and just leverage it to do the
grunt-work that you'd find boring.
mkw5053 wrote 35 min ago:
Iâve had reasonable success having it ultrathink of every possible
X (exhaustively) and their trades offs and then give me a ranked list
and rationale of its top recommendations. I almost always choose the
top but just reading the list and then giving it next steps has
worked really well for me.
awesome_dude wrote 3 hours 32 min ago:
My experience has been with Claude that having it "review" my code
has produced some helpful feedback and refactoring suggestions, but
also, it falls short in others
ljm wrote 4 hours 2 min ago:
I am basically rawdogging Claude these days, I donât use MCPs or
anything else, I just lay down all of the requirements and the
suggestions and the hints, and let it go to work.
When I see my colleagues use an LLM they are treating it like a mind
reader and their prompts are, frankly, dogshit.
It shows that articulating a problem is an important skill.
theshrike79 wrote 6 hours 20 min ago:
Codex is better for the latter style. It takes its time, mulls about
and investigates and sometimes finds a nugget of gold.
Claude is for getting shit done, it's not at its best at long
research tasks.
dolftax wrote 7 hours 4 min ago:
The structured vs open-ended distinction here applies to code review
too. When you ask an LLM to "find issues in this code", it'll happily
find something to say, even if the code is fine. And when there are
actual security vulnerabilities, it often gets distracted by style
nitpicks and misses the real issues.
Static analysis has the opposite problem - very structured,
deterministic, but limited to predefined patterns and overwhelms you
in false positives.
The sweet spot seems to be to give structure to what the LLM should
look for, rather than letting it roam free on an open-ended "review
this" prompt.
We built Autofix Bot[1] around this idea. [1] (disclosure: founder)
(HTM) [1]: https://autofix.bot
ericmcer wrote 7 hours 10 min ago:
Exactly, if you visualize software as a bunch separate "states" (UI
state, app state, DB state) then our job is to mutate states and
synchronize those mutations across the system. LLMs are good at
mutating a specific state in a specific way. They are trash at
designing what data shape a state should be, and they are bad at
figuring out how/why to propagate mutations across a system.
order-matters wrote 8 hours 9 min ago:
TBH I think its ability to structure unstructured data is what makes
it a powerhouse tool and there is so much juice to squeeze there that
we can make process improvements for years even if it doesnt get any
better at general intelligence.
If I had a pdf printout of a table, the workflow i used to have to
use to get that back into a table data structure to use for
automation was hard (annoying). dedicated OCR tools with limitations
on inputs, multiple models in that tool for the different ways the
paper the table was on might be formatted. it took hours for a new
input format
now i can take a photo of something with my phone and get a data
table in like 30 seconds.
people seem so desperate to outsource their thinking to these models
and operating at the limits of their capability, but i have been
having a blast using it to cut through so much tedium that werent
unsolved problems but required enough specialized tooling and custom
config to be left alone unless you really had to
this fits into what youre saying with using it to do the grunt work i
find boring i suppose, but feels a little bit more than that - like
it has opened a lot of doors to spaces that had grunt work that wasnt
worth doing for the end result previously but now it is
d-lisp wrote 8 hours 20 min ago:
I remember about a problem I had while quick testing notcurses. I
tried chatGPT which produced a lot of weird but kinda believable
statements about the fact that I had to include wchar and define a
specific preprocessor macro, AND I had to place the includes for
notcurses, other includes and macros in a specific order.
My sentiment was "that's obviously a weird non-intended hack" but I
wanted to test quickly, and well ... it worked. Later, reading the
man-pages I aknowledged the fact that I needed to declare specific
flags for gcc in place of the gpt advised solution.
I think these kind of value based judgements are hard to emulate for
LLMs, it's hard for them to identifiate a single source as the most
authoritative source in a sea of lesser authoritative (but numerous)
sources.
andai wrote 8 hours 43 min ago:
The current paradigm is we sorta-kinda got AGI by putting dodgy AI in
a loop:
until works { try again }
The stuff is getting so cheap and so fast... a sufficient increment
in quantity can produce a phase change in quality.
ludicrousdispla wrote 9 hours 2 min ago:
>> "Hey claude, I get this error message: ", and it'll often find the
root cause quicker than I could.
Back in the day, we would just do this with a search engine.
cultofmetatron wrote 9 hours 31 min ago:
> There's a significant blind-spot in current LLMs related to
blue-sky thinking and creative problem solving.
thats called job security!
mbesto wrote 10 hours 15 min ago:
> There's a significant blind-spot in current LLMs related to
blue-sky thinking and creative problem solving. It can do structured
problems very well, and it can transform unstructured data very well,
but it can't deal with unstructured problems very well.
While this is true in my experience, the opposite is not true. LLMs
are very good at helping me go through a structure processing of
thinking about architectural and structural design and then help
build a corresponding specification.
More specifically the "idea honing" part of this proposed process
works REALLY well: [1] This:
Each question should build on my previous answers, and our end goal
is to have a detailed specification I can hand off to a developer.
Letâs do this iteratively and dig into every relevant detail.
Remember, only one question at a time.
(HTM) [1]: https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/
skydhash wrote 9 hours 34 min ago:
I've checked the linked page and there's nothing about even
learning the domain or learning the tech platform you're going to
use. It's all blind faith, just a small step above copying stuff
from GitHub or StackOverflow and pushing it to prod.
mbesto wrote 4 hours 25 min ago:
You completely missed the point of my comment...
giancarlostoro wrote 10 hours 26 min ago:
> "Hey claude, I get this error message: ", and it'll often find the
root cause quicker than I could.
This is true, as for "Open Ended" I use Beads with Claude code, I ask
it to identify things based on criteria (even if its open ended) then
I ask it to make tasks, then when its done I ask it to research and
ask clarifying questions for those tasks. This works really well.
asmor wrote 10 hours 27 min ago:
This is it. It doesn't replace the higher level knowledge part very
well.
I asked Claude to fix a pet peeve of mine, spawning a second process
inside an existing Wine session (pretty hard if you use umu, since it
runs in a user namespace). I asked Claude to write me a python server
to spawn another process to pass through a file handler "in Proton",
and it proceeded a long loop of trying to find a way to launch into
an existing wine session from Linux with tons of environment
variables that didn't exist.
Then I specified "server to run in Wine using Windows Python" and it
got more things right. Except it tried to use named pipes for IPC.
Which, surprise surprise, doesn't work to talk to the Linux piece.
Only after I specified "local TCP socket" it started to go right. Had
I written all those technical constraints and made the design
decisions in the first message it'd have been a one-hit success.
cyral wrote 10 hours 34 min ago:
Using the plan mode in cursor (or asking claude to first come up with
a plan) makes it pretty good at generic "how can I improve" prompts.
It can spend more effort exploring the codebase and thinking before
implementing.
pdntspa wrote 10 hours 36 min ago:
That's why you treat it like a junior dev. You do the fun stuff of
supervising the product, overseeing design and implementation,
breaking up the work, and reviewing the outputs. It does the boring
stuff of actually writing the code.
I am phenomenally productive this way, I am happier at my job, and
its quality of work is extremely high as long as I occasionally have
it stop and self-review it's progress against the style principles
articulated in its AGENTS.md file. (As it tends to forget a lot of
rules like DRY)
xnx wrote 3 hours 12 min ago:
> rules like DRY
Principles like DRY
order-matters wrote 8 hours 4 min ago:
I wonder if DRY is still a principle worth holding onto in the AI
coding era. I mean it probably is, but this feels like enough of a
shift in coding design that re-evaluating principles designed for
human-only coding might be worth the effort
tiku wrote 8 hours 45 min ago:
I enjoy finding the problem and then telling Claude to fix it.
Specifying the function and the problem. Then going to get a coffee
from the breakroom to see it finished when I return. The junior dev
has questions when I did that. Claude just fixes it.
AStrangeMorrow wrote 9 hours 52 min ago:
Yeah at this point I basically have to dictate all implementation
details: do this, but do it this specific way, handle xyz edge
cases by doing that, plug the thing in here using that API.
Basically that expands 10 lines into 100-200 lines of code.
However if I just say âI have this goal, implement a solutionâ,
chances are that unless it is a very common task, it will come up
with a subpar/incomplete implementation.
Whatâs funny to me is that complexity has inverted for some
tasks: it can ace a 1000 lines ML model for a general task I give
it, yet will completely fail to come up with a proper solution for
a 2D geometric problem that mostly has high school level maths that
can be solved in 100 lines
rootnod3 wrote 10 hours 8 min ago:
Cool cool cool. So if you use LLMs as junior devs, let me ask you
how future awesome senior devs like you will come around? From WHAT
job experience? From what coding struggle?
platevoltage wrote 7 hours 24 min ago:
There's that long term thinking that the tech industry, and
really every other publicly traded company is known for.
pdntspa wrote 9 hours 37 min ago:
My last job there was effectively a gun held to the back of my
head, ordering me to use this stuff. And this started about a
year ago, when the tooling for agentic dev was absolutely
atrocious, because we had a CTO who had the biggest most raging
boner for anything that offered even a whiff of "AI".
Unfortunately the bar is being raised on us. If you can't hang
with the new order you are out of a job. I promise I was one of
the holdouts who resisted this the most. It's probably why I got
laid off last spring.
Thankfully, as of this last summer, agentic dev started to really
get good, and my opinion made a complete 180. I used the off time
to knock out a personal project in a month or two's worth of
time, that would have taken me a year+ the old way. I leveraged
that experience to get me where I am now.
rootnod3 wrote 9 hours 15 min ago:
Ok, now assume you start relying on it and let's assume cloud
flare has another outage. You just go and clock out for the day
saying "can't work, agent is down"?
I don't think we'll be out of jobs. Maybe temporarily. But
those jobs come back. The energy and money drain that LLMs are,
are just not sustainable.
I mean, it's cool that you got the project knocked out in a
month or two, but if you'd sit down now without an LLM and try
to measure the quality of that codebase, would you be 100%
content? Speed is not always a good metric. Sure, 1 -2 months
for a project is nice, but isn't especially a personal project
more about the fun of doing the project and learning something
from it and sharpening your skills?
pdntspa wrote 8 hours 24 min ago:
When the POS system goes down at a restaurant they'll revert
to pen and paper. Can't imagine its much different in that
case.
fluidcruft wrote 9 hours 51 min ago:
How do you get junior devs if your concept of the LLM is that
it's "a principal engineer" that "do[es] not ask [you] any
questions"?
Also, I'm pretty sure junior devs can use directing a LLM to
learn from mistakes faster. Let them play. Soon enough they're
going to be better than all of us anyway. The same way widespread
access to strong chess computers raised the bar at chess clubs.
rootnod3 wrote 9 hours 19 min ago:
I don't think the chess analogy grabs here. In chess, you play
_against_ the chess computer. Take the same approach and let
the chess computer play FOR the player and see how far he gets.
fluidcruft wrote 9 hours 4 min ago:
Maybe. I don't think adversarial vs not is as important as
gaining experience. Ultimately both are problem solving tasks
and learning instincts about which approaches work best in
certain situations.
I'm probably a pretty shitty developer by HN standards but I
generally have to build a prototype to fully understand and
explore problem and iterate designs and LLMs have been pretty
good for me as trainers for learning things I'm not familiar
with. I do have a certain skill set, but the non-domain stuff
can be really slow and tedious work. I can recognize "good
enough" and "clean" and I think the next generation can use
that model very well to be become native with how to succeed
with these tools.
Let me put it this way: people don't have to be hired by the
best companies to gain experience using best practices
anymore.
bpt3 wrote 10 hours 2 min ago:
Why is that a developer's problem? If anything, they are
incentivized to avoid creating future competition in the job
market.
rootnod3 wrote 9 hours 14 min ago:
It's not a problem for the senior dev directly, but maybe down
the road. And it definitely is a problem for the company once
said senior dev leaves or retires.
Seriously, long term thinking went out the window long time
ago, didn't it?
bpt3 wrote 5 hours 15 min ago:
No, long term thinking didn't go out the window.
It is definitely a problem for the company. How is it a
problem for the senior dev at any point?
What incentive do they have to aid the company at the expense
of their own *long term* career prospects?
eightysixfour wrote 10 hours 4 min ago:
What would you like individual contributors to do about it,
exactly? Refuse to use it, even though this person said they're
happier and more fulfilled at work?
I'm asking because I legitimately have not figured out an answer
to this problem.
mjr00 wrote 10 hours 18 min ago:
> That's why you treat it like a junior dev. You do the fun stuff
of supervising the product, overseeing design and implementation,
breaking up the work, and reviewing the outputs. It does the boring
stuff of actually writing the code.
I am so tired of this analogy. Have the people who say this never
worked with a junior dev before? If you treat your junior devs as
brainless code monkeys who only exist to type out your brilliant
senior developer designs and architectures instead of, you know,
human beings capable of solving problems, 1) you're wasting your
time, because a less experienced dev is still capable of solving
problems independently, 2) the juniors working under you will hate
it because they get no autonomy, and 3) the juniors working under
you will stay junior because they have no opportunity to
learn--which means you've failed at one of your most important
tasks as a senior developer, which is mentorship.
pdntspa wrote 9 hours 26 min ago:
I have mentored and worked with a junior dev. And the only way to
get her to do anything useful and productive was to spell things
out. Otherwise she got wrapped around the axle trying to figure
out the complex things and was constantly asking for my help with
basic design-level tasks. Doing the grunt work is how you learn
the higher-level stuff.
When I was a junior, that's how it was for me. The senior gave me
something that was structured and architected and asked me to
handle smaller tasks that were beneath them.
Giving juniors full autonomy is a great way to end up with an
unmaintainable mess that is a nightmare to work with without
substancial refactoring. I know this because I have made a career
out of fixing exactly this mistake.
mjr00 wrote 9 hours 23 min ago:
I have never worked with junior devs as incompetent as you
describe, having worked at AWS, Splunk/Cisco, among others. At
AWS even interns essentially got assigned a full project for
their term and were just told to go build it. Does your company
just have an absurdly low hiring bar for juniors?
> Giving juniors full autonomy is a great way to end up with an
unmaintainable mess that is a nightmare to work with without
substancial refactoring.
Nobody is suggesting they get full autonomy to cowboy code and
push unreviewed changes to prod. Everything they build should
be getting reviewed by their peers and seniors. But they need
opportunities to explore and make mistakes and get feedback.
pdntspa wrote 9 hours 18 min ago:
> AWS, Splunk/Cisco
It's an entirely different world in small businesses that
aren't primarily tech.
alfalfasprout wrote 10 hours 18 min ago:
I really hope you don't actually treat junior devs this way...
FeteCommuniste wrote 10 hours 31 min ago:
Maybe I'm weird but I enjoy "actually writing the code."
theshrike79 wrote 6 hours 16 min ago:
You really get enjoyment writing a full CRUD HTTP API five times,
one for each endpoint?
I don't :) Before I had IDE templates and Intellisense. Now I can
just get any agentic AI to do it for me in 60 seconds and I can
get to the actual work.
skydhash wrote 2 hours 22 min ago:
Why do you need a full crud http api for? Just loading the data
straight from the database? Usually I've already implemented
that before and I just copy paste the implementation and doing
some VIM magic. And in Frameworks like Rails or Laravel, it may
be less than 10 lines of code. More involved business logic?
Then I'm spending more time getting a good spec for those than
implementing the spec.
vitro wrote 9 hours 40 min ago:
I sometimes think of it as a sculptor analogy.
Some famous sculptors had an atelier full of students that helped
them with mundane tasks, like carving out a basic shape from a
block of stone.
When the basic shape was done, the master came and did the rest.
You may want to have the physical exercise of doing the work
yourself, but maybe someone sometimes likes to do the fine work
and leave the crude one to the AI.
breuleux wrote 10 hours 1 min ago:
In my case, it really depends what. I enjoy designing systems and
domain-specific languages or writing libraries that work the way
I think they should work.
On the other hand, if e.g. I need a web interface to do
something, the only way I can enjoy myself is by designing my own
web framework, which is pretty time-consuming, and then I still
need to figure out how to make collapsible sections in CSS and
blerghhh. Claude can do that in a few seconds. It's a delightful
moment of "oh, thank god, I don't have to do this crap anymore."
There are many coding tasks that are just tedium, including 99%
of frontend development and over half of backend development. I
think it's fine to throw that stuff to AI. It still leaves a lot
of fun on the table.
loloquwowndueo wrote 10 hours 21 min ago:
âI want my AI to do laundry and dishes so I can code, not for
my AI to code so I can do laundry and dishesâ
moffkalast wrote 8 hours 9 min ago:
Well it would be funnier if dishwashers, washing machines and
dryers didn't automate that ages ago. It's literally one of the
first things robots started doing for us.
thewebguyd wrote 9 hours 28 min ago:
This sums up my feelings almost exactly.
I don't want LLMs, AI, and eventually Robots to take over the
fun stuff. I want them to do the mundane, physical tasks like
laundry and dishes, leave me to the fun creative stuff.
But as we progress right now, the hype machine is pushing AI to
take over art, photography, video, coding, etc. All the stuff I
would rather be doing. Where's my house cleaning robot?
zelphirkalt wrote 7 hours 47 min ago:
I would like to go even further and say: Those things, art,
photography, video, coding ... They are forms of craft, human
expression, creativity. They are part of what makes life
interesting. So we are in the process of eliminating the
interesting and creative parts, in the name of profit and
productivity maxing (if any!). Maybe we can create the 100th
online platform for the same thing soon 10x faster! Wow!
Of course this is a bit too black&white. There can still be a
creative human being introducing nuance and differences,
trying to get the automated tools to do things different in
the details or some aspects. Question is, losing all those
creative jobs (in absolute numbers of people doing them),
what will we as society, or we as humanity become? What's the
ETA on UBI, so that we can reap the benefits of what we
automated away, instead of filling the pockets of a few?
minimaxir wrote 10 hours 8 min ago:
Claude is very good at unfun-but-necessary coding tasks such as
writing docstrings and type hints, which is a prominent
instance of "laundry and dishes" for a dev.
mrguyorama wrote 9 hours 11 min ago:
>writing docstrings and type hints
Disagree. Claude makes the same garbage worthless comments as
a Freshman CS student. Things like:
// Frobbing the bazz
res = util.frob(bazz);
Or
// If bif is True here then blorg
if (bif){
blorg;
}
Like wow, so insightful
And it will ceaselessly try to auto complete your comments
with utter nonsense that is mostly grammatically correct.
The most success I have had is using claude to help with
Spring Boot annotations and config processing (Because
documentation is just not direct enough IMO) and to rubber
duck debug with, where claude just barely edges out the
rubber duck.
minimaxir wrote 9 hours 8 min ago:
I intentionally said docstrings instead of comments.
Comments by default can be verbose on agents but a line in
the AGENTS.md does indeed wrangle modern agents to only
comment on high signal code blocks that are not
tautological.
loloquwowndueo wrote 9 hours 37 min ago:
âSorry, the autogenerated api documentation was wrong
because the ai hallucinated the docstringâ
theshrike79 wrote 6 hours 9 min ago:
You can't read?
Please don't say you commit AI-generated stuff without
checking it first?
re-thc wrote 10 hours 19 min ago:
Soon you'll realize you're the "AI". We've lost control.
nyadesu wrote 10 hours 24 min ago:
In my case, I enjoy writing code too, but it's helpful to have an
assistant I can ask to handle small tasks so I can focus on a
specific part that requires attention to detail
FeteCommuniste wrote 10 hours 10 min ago:
Yeah, I sometimes use AI for questions like "is it possible to
do [x] using library [y] and if so, how?" and have received
mostly solid answers.
georgemcbay wrote 9 hours 45 min ago:
> Yeah, I sometimes use AI for questions like "is it possible
to do [x] using library [y] and if so, how?" and have
received mostly solid answers.
In my experience most LLMs are going to answer this with some
form of "Absolutely!" and then propose a
square-peg-into-a-round-hole way to do it that is likely
suboptimal vs using a different library that is far more
suited to your problem if you didn't guess the right fit
library to begin with.
The sycophancy problem is still very real even when the topic
is entirely technical.
Gemini is (in my experience) the least likely to lead you
astray in these situations but its still a significant
problem even there.
jessoteric wrote 6 hours 43 min ago:
IME this has been significantly reduced in newer models
like 4.5 Opus and to a lesser extent Sonnet, but agree it's
still sort of bad- mainly because the question you're
posing is bad.
if you ask a human this the answer can also often be "yes
[if we torture the library]", because software development
is magic and magic is the realm of imagination.
much better prompt: "is this library designed to solve this
problem" or "how can we solve this problem? i am
considering using this library to do so, is that
realistic?"
stouset wrote 10 hours 0 min ago:
Or âcan you prototype doing A via approaches X, Y, and Z,
and show me what each looks like?â
I love to prototype various approaches. Sometimes I just want
to see which one feels like the most natural fit. The LLM can
do this in a tenth of the time I can, and I just need to get
a general idea of how each approach would feel in practice.
skydhash wrote 9 hours 43 min ago:
> Sometimes I just want to see which one feels like the
most natural fit.
This sentence alone is a huge red flag in my books. Either
you know the problem domain and can argue about which
solution is better and why. Or you don't and what you're
doing are experiment to learn the domain.
There's a reason the field is called Software Engineering
and not Software Art. Words like "feels" does not belongs.
It would be like saying which bridge design feels like the
most natural fit for the load. Or which material feels like
the most natural fit for a break system.
doug_durham wrote 6 hours 8 min ago:
Do you develop software? Software unlike any physical
engineering field. The complexity of any project beyond
the most trivial is beyond human ability to work with.
You have to switch from analytic tools to more
probabilistic tools. That where "feels", "smells", or
"looks" come in. Software testing is not a solved
problem, unlike bridge testing.
skydhash wrote 5 hours 53 min ago:
So many FOSS software are made and maintained by a
single person. Much more are developer by a very small
teams. Probabilistic arenât needed anywhere.
fluidcruft wrote 9 hours 10 min ago:
For example sometimes you're faced with choosing between
high-quality libraries to adopt and it's not particularly
clear whether you picked the wrong one until after you've
tried integrating them. I've found it can be pretty
helpful to let the LLM try them all and see where the
issues ultimately are.
skydhash wrote 7 hours 32 min ago:
> sometimes you're faced with choosing between
high-quality libraries to adopt and it's not
particularly clear whether you picked the wrong one
until after you've tried integrating them.
Maybe I'm lucky, but I've never encountered this
situation. It has been mostly about what tradeoffs I'm
willing to make. Libraries are more line of codes added
to the project, thus they are liabilities. Including
one is always a bad decision, so I only do so because
the alternative is worse. Having to choose between two
is more like between Scylla and Charybdis (known
tradeoffs) than deciding to go left or right in a maze
(mystery outcome).
fluidcruft wrote 7 hours 5 min ago:
It probably depends on what you're working on. For
the most part relying on a high-quality
library/module that already implements a solution is
less code to maintain. Any problems with the shared
code can be fixed upstream with more eyeballs and
more coverage than anything I build locally. I prefer
to keep my eyeballs on things most related to my
domain and not maintain stuff that's both ultimately
not terribly important and replaceable (if push comes
to shove).
Generally, you are correct that having multiple
libraries to choose among is concerning, but it
really depends. Mostly it's stylistic choices and it
can be hard to tell how it integrates before trying.
mjr00 wrote 9 hours 32 min ago:
> There's a reason the field is called Software
Engineering and not Software Art. Words like "feels" does
not belongs.
Software development is nowhere near advanced enough for
this to be true. Even basic questions like "should this
project be built in Go, Python, or Rust?" or "should this
project be modeled using OOP and domain-driven design,
event-sourcing, or purely functional programming?" are
decided largely by the personal preferences of whoever
the first developer is.
skydhash wrote 7 hours 42 min ago:
Such questions may be decided by personal preferences,
but their impact can easily be demonstrated. Such
impacts are what F. Brooks calls accidental complexity
and we generally called technical debt. It's just that,
unlike other engineering fields, there are not a lot of
physical constraints and the decision space have much
more dimensions.
mjr00 wrote 7 hours 24 min ago:
> Such questions may be decided by personal
preferences, but their impact can easily be
demonstrated.
I really don't think this is true. What was the
demonstrated impact of writing Terraform in Go rather
than Rust? Would writing Terraform in Rust have
resulted in a better product? Would rewriting it now
result in a better product? Even among engineers with
15 years experience you're going to get differing
answers on this.
skydhash wrote 7 hours 7 min ago:
The impact is that now, if you want to modify the
project in some way, you will need to learn Go.
It's like all the codebases in COBOL. Maybe COBOL
at that time was the best language for the product,
but now, it's not that easy to find someone with
the knowledge to maintain the system. As soon as
you make a choice, you accept that further down the
line, there will be some X cost to keep going in
that direction and some Y cost to revert. As a
technical lead, more often you need to ensure that
X or/and Y don't grow to be enormous.
mjr00 wrote 6 hours 47 min ago:
> The impact is that now, if you want to modify
the project in some way, you will need to learn
Go.
That's tautologically true, yes, but your claim
was
> Either you know the problem domain and can
argue about which solution is better and why. Or
you don't and what you're doing are experiment to
learn the domain.
So, assuming the domain of infrastructure-at-code
is mostly known now which is a fair statement --
which is a better choice, Go or Rust, and why?
Remember, this is objective fact, not art, so no
personal preferences are allowed.
KronisLV wrote 3 hours 1 min ago:
> So, assuming the domain of
infrastructure-as-code is mostly known now
which is a fair statement -- which is a better
choice, Go or Rust, and why? Remember, this is
objective fact, not art, so no personal
preferences are allowed.
I think itâs possible to engage with
questions like these head on and try to find an
answer.
The problem is that if you want the answer to
be close to accurate, you might need both a lot
of input data about the situation (including
whoâd be working with and maintaining the
software, what are their skills and weaknesses;
alongside the business concerns that impact the
timeline, the scale at which youâre working
with and a 1000 other things), as well as the
output of concrete suggestions might be a
flowchart so big itâd make people question
their sanity.
Itâs not impossible, just impractical with a
high likelihood of being wrong due to bad or
insufficient data or interpretation.
But to humor the question: as an example, if
you have a small to mid size team with run of
the mill devs that have some traditional OOP
experience and have a small to mid
infrastructure size and complexity, but also
have relatively strict deadlines, limited
budget and only average requirements in regards
to long term maintainability and correctness
(nobody will die if the software doesnât work
correctly every single time), then Go will be
closer to an optimal choice.
I know that because I built an environment
management solution in Go, trying to do that in
Rust in the same set of circumstances
wouldnât have been successful, objectively
speaking. I just straight up wouldnât have
iterated fast enough to ship. Of course, I can
only give such a concrete answer for that very
specific set of example circumstances after the
fact. But even initially those factors pushed
me towards Go.
If you pull any number of levers in a different
direction (higher correctness requirements,
higher performance requirements, different team
composition), then all of those can influence
the outcome towards Rust. Obviously every
detail about what a specific system must do
also influences that.
skydhash wrote 5 hours 44 min ago:
Neither. Because the solution for IaC is not Go
or Rust, just like the solution for composing
music is not a piano or a violin.
A solution may be Terraform, another is
Ansible,⦠To implement that solution, you
need a programming language, but by then
youâre solving accidental complexity, not the
essential one attached to the domain. You may
be solving, implementation speed, hiring costs,
code safety,⦠but youâre not solving IaC.
nottorp wrote 10 hours 4 min ago:
Just be careful if functionality varies between library y
version 2 and library y version 3, or if there is a similarly
named library y2 that isn't the same.
You may get possibilities, but not for what you asked for.
pdntspa wrote 9 hours 43 min ago:
If you run to the point where you can execute each idea and
examine its outputs, problems like that surface pretty
quickly
nottorp wrote 9 hours 32 min ago:
Of course, by that time i could have read the docs for
library y the version I'm using...
pdntspa wrote 9 hours 20 min ago:
There are many roads to Rome...
pdntspa wrote 10 hours 29 min ago:
Me writing code is me spending 3/4 of my time wading through
documentation and google searches. It's absolutely hell on my
ADD. My ability to memorize is absolutely garbage. Throughout my
career I've worked in like 10 different languages, and in any
given project I'm usually working in at least 3 or 4. There's a
lot of "now what is a map operation in this stupid fucking
language called again?!"
Claude writing code gets the same output if not better in about
1/10 of the time.
That's where you realize that the writing code bits are just one
small part of the overall picture. One that I realize I could do
without.
skydhash wrote 9 hours 39 min ago:
I would say notetaking would be a much bigger help than Claude
at this point. There's a lot of methods to organize information
that I believe would help you, better than an hallucination
machine.
neoromantique wrote 9 hours 34 min ago:
Notetaking with ADHD is another sort of hell to be honest.
I absolutely can attest to what parent is saying, I have been
developing software in Python for nearly a decade now and I
still routinely look up the /basics/.
LLM's have been a complete gamechanger to me, being able to
reduce the friction of "ok let me google what I need in a
very roundabout way my memory spit it out" to a fast and
often inline llm lookup.
theshrike79 wrote 6 hours 12 min ago:
This is the thing. I _know_ what the correct solution looks
like.
But figuring out what is the correct way in this particular
language is the issue.
Now I can get the assistant to do it, look at it and go
"yep, that's how you iterate over an array of strings".
skydhash wrote 7 hours 49 min ago:
Looking up documentation is normal. If not, we wouldn't
have the manual pages in Unix and such an emphasis on
documentation in ecosystems like Lisp, Go, Python, Perl,...
We even have cheatsheets and syntax references books
because it's just so easy to forget the /basics/.
I said notetaking, but it's more about building your own
index. In $WORK projects, I mostly use the browser
bookmarks, the ticket system, the PR description and
commits to contextually note things. In personal projects,
I have an org-mode file (or a basic text file) and a lot of
TODO comments.
neoromantique wrote 5 hours 16 min ago:
It is very hard to explain the extent of it to a person
who did not experience it, really.
I have over a decade of experience, I do this stuff
daily, I don't think I can write a 10 line bash/python/js
script without looking up the docs at least a couple
times.
I understand exactly what I need to write, but exact form
eludes my brain, so this Levenshtein-distance-on-drugs
machine that can parse my rambling + surrounding context
into valid syntax for what I need right at that time is
invaluable and I would even go as far as saying life
changing.
I understand and hold high level concepts alright, I know
where stuff is in my codebase, I understand how it all
works down to very low levels, but the minutea of
development is very hard due to how my memory works (and
has always worked).
skydhash wrote 2 hours 46 min ago:
What I'm saying is that is normal. Unless you've worked
everyday with the same language and a very small set of
functions, you're bound to forget signature and syntax.
What I'm advocating is a faster retrieval of the
correct information.
neoromantique wrote 50 min ago:
>Unless you've worked everyday with the same language
...I did.
pdntspa wrote 5 hours 49 min ago:
And all that take rote mechanical work. Which can quickly
lead to fractured focus and now suddenly I'm pulled out
of my flow.
Or I can farm that stuff to an LLM, stay in my flow, and
iterate at a speed that feels good.
tayo42 wrote 10 hours 18 min ago:
How do you end up with 3 to 4 languages in one project?
theshrike79 wrote 6 hours 10 min ago:
Go for the backend, something javascripty for the front end.
You're already at two. Depending if you count HTML, CSS or
SQL as "languages", you're up to a half dozen pretty quick.
jessoteric wrote 6 hours 41 min ago:
i find it's pretty rare to have a project that only consists
of one or two languages, over a certain complexity/feature
threshold
zelphirkalt wrote 7 hours 57 min ago:
3 or 4 can very easily accumulate. For example: HTML, CSS as
must know, plus some JS/TS (actually that's 2 langs!) for
sprinkles of interactivity, backend in any proper backend
language. Oh wait, there is a fifth language, SQL, because we
need to access the database. Ah and those few shell scripts
we need? Someone's gotta write those too. They may not always
be full programming languages, but languages they are, and
one needs to know them.
merely-unlikely wrote 9 hours 38 min ago:
Recently I've been experimenting with using multiple
languages in some projects where certain components have a
far better ecosystem in one language but the majority of the
project is easier to write in a different one.
For example, I often find Python has very mature and
comprehensive packages for a specific need I have, but it is
a poor language for the larger project (I also just hate
writing Python). So I'll often put the component behind a
http server and communicate that way. Or in other cases I've
used Rust for working with WASAPI and win32 which has some
good crates for it, but the ecosystem is a lot less mature
elsewhere.
I used to prefer reinventing the wheel in the primary project
language, but I wasted so much time doing that. The tradeoff
is the project structure gets a lot more complicated, but
it's also a lot faster to iterate.
Plus your usual html/css/js on the frontend and something
else on the backend, plus SQL.
pdntspa wrote 9 hours 48 min ago:
Oh my sweet summer child...
tomgp wrote 10 hours 6 min ago:
HTML, CSS, Javascript?
saulpw wrote 10 hours 8 min ago:
Typescript on the frontend, Python on the backend, SQL for
the database, bash for CI. This isn't even counting HTML/CSS
or the YAML config.
tayo42 wrote 9 hours 34 min ago:
I wouldn't call html, yaml or css languages.
Same for sql, do you really context switch between sql and
other code that frequently?
Everyone should stop using bash, especially if you have a
scripting language you can use already.
wosat wrote 7 hours 16 min ago:
Sorry for being pedantic, but what does the "L" stand for
in HTML, YAML, SQL?
They may not be "programming languages" or, in the case
of SQL, a "general purpose programming language", but
they are indeed languages.
pdntspa wrote 9 hours 23 min ago:
Dude have you even written any hardcore SQL? plpgSQL is
very much a turing-complete language
n4r9 wrote 10 hours 26 min ago:
May be a domain issue? If you're largely coding within a JS
framework (which most software devs are tbf) then that makes
total sense. If you're working in something like fintech or
games, perhaps less so.
pdntspa wrote 10 hours 21 min ago:
My last job was a mix of Ruby, Python, Bash, SQL, and
Javascript (and CSS and HTML). One or two jobs before that it
was all those plus a smattering of C. A few jobs before that
it was C# and Perl.
n4r9 wrote 10 hours 32 min ago:
I think we have different opinions on what's fun and what's boring!
Nemi wrote 9 hours 19 min ago:
You've really hit the crux of the problem and why so many people
have differing opinions about AI coding. I also find coding more
fun with AI. The reason is that my main goal is to solve a
problem, or someone else's problem, in a way that is satisfying.
I don't much care about the code itself anymore. I care about the
thing that it does when it's done.
Having said that I used to be deep into coding and back then I am
quite sure that I would hate AI coding for me. I think for me it
comes down to â when I was learning about coding and stretching
my personal knowledge in the area, the coding part was the fun
part because I was learning. Now that I am past that part I
really just want to solve problems, and coding is the means to
that end. AI is now freeing because where I would have been
reluctant to start a project, I am more likely to give it a go.
I think it is similar to when I used to play games a lot. When I
would play a game where you would discover new items regularly, I
would go at it hard and heavy up until the point where I
determined there was either no new items to be found or it was
just "more of the same". When I got to that point it was like a
switch would flip and I would lose interest in the game almost
immediately.
altmanaltman wrote 2 hours 11 min ago:
A few counterpoints:
1. If you don't care about code and only care about the "thing
that it does when it's done", how do you solve problems in a
way that is satisfying? Because you are not really solving any
problem but just using the AI to do it. Is prompting more
satisfying than actually solving?
2. You claim you're done "learning about coding and stretching
my personal knowledge in the area" but don't you think that's
super dangerous? Like how can you just be done with learning
when tech is constantly changing and new things come up
everyday. In that sense, don't you think AI use is actually
making you learn less and you're just justifying it with the
whole "I love solving problems, not code" thing?
3. If you don't care about the code, do the people who hire you
for it do? And if they do, then how can you claim you don't
care about the code when you'll have to go through a review
process and at least check the code meaning you have to care
about the code itself, right?
danielmarkbruce wrote 59 min ago:
are you really solving the problem, or is the compiler doing
it?
altmanaltman wrote 25 min ago:
is the compiler really solving the problem or the
electricity flowing through the machine?
ukuina wrote 3 min ago:
Is it the electricity, or is it quantum entanglement with
Roko's Basilisk?
keeda wrote 1 hour 4 min ago:
Note I'm not saying one is better than the other, but my
takes:
1. The problem solving is in figuring out what to prompt,
which includes correctly defining the problem, identifying a
potential solution, designing an architecture, decomposing it
into smaller tasks, and so on.
Giving it a generic prompt like "build a fitness tracker"
will result in a fully working product but it will be bland
as it would be the average of everything in its training
data, and won't provide any new value. Instead, you probably
want to build something that nobody else has, because that's
where the value is. This will require you to get pretty deep
into the problem domain, even if the code itself is
abstracted away from you.
Personally, once the shape of the solution and the code is
crystallized in my head typing it out is a chore. I'd rather
get it out ASAP, get the dopamine hit from seeing it work,
and move on to the next task. These days I spend most of my
time exploring the problem domain rather than writing code.
2. Learning still exists but at a different level; in fact it
will be the only thing we will eventually be doing. E.g. I'm
doing stuff today that I had negligible prior background in
when I began. Without AI, I would probably require an
advanced course to just get upto speed. But now I'm learning
by doing while solving new problems, which is a brand new way
of learning! Only I'm learning the problem domain rather than
the intricacies of code.
3. Statistically speaking, the people who hire us don't
really care about the code, they just want business results.
(See: the difficulty of funding tech debt cleanup projects!)
Personally, I still care about the code and review
everything, whether written by me or the AI. But I can see
how even that is rapidly becoming optional.
I will say this: AI is rapidly revolutionizing our field and
we need to adapt just as quickly.
skydhash wrote 1 min ago:
> The problem solving is in figuring out what to prompt,
which includes correctly defining the problem, identifying
a potential solution, designing an architecture,
decomposing it into smaller tasks, and so on
Coding is just a formal specification, one that is suited
to be automatically executed by a dumb machine. The nice
trick is that the basic semantics units from a programming
language are versatile enough to give you very powerful
abstractions that can fit nicely with the solution your are
designing.
> Personally, once the shape of the solution and the code
is crystallized in my head typing it out is a chore
I truly believe that everyone that says that typing is a
chore once they've got the shape of a solution get
frustrated by the amount of bad assumptions they've made.
That ranges from not having a good design in place to not
learning the tools they're using and fighting it during the
implementation (Like using React in an imperative manner).
You may have something as extensive as a network protocol
RFC, and still got hit by conflict between the specs and
what works.
altmanaltman wrote 20 min ago:
Honestly, I fundamentally disagree with this. Figuring out
"what to prompt" is not problem-solving in a true sense
imo. And if you're really going too deep into the problem
domain, what is the point of having the code abstracted?
My comment was based on you saying you don't care about the
code and only what it does. But now you're saying you care
about the code and review everything so I'm not sure what
to make out of it. And again, I fundamentally disagree that
reviewing code will become optional or rather should become
optional. But that's my personal take.
pdntspa wrote 1 hour 23 min ago:
Why can't both things be true? You can care about the code
even if you don't write it. You can continue learning things
by reading said code. And you can very rigidly enforce code
quality guidelines and require the AI adhere to them.
altmanaltman wrote 26 min ago:
I mean if you're reading it and "rigidly" enforcing code
quality guidelines, then you do care about the code, right?
But the parent comment said they don't care about the code
but what it does. Both of them cannot be true at the same
time, since in your example, you do care about the code
enough to read it and refactor it based on guidelines and
not just "what the code" does.
libraryofbabel wrote 5 hours 59 min ago:
I like this framing; I think it captures some of the key
differences between engineers who are instinctively
enthusiastic about AI and those who are not.
Many engineers walk a path where they start out very focussed
on programming details, language choice, and elegant or clever
solutions. But if you're in the game long enough, and
especially if you're working in medium-to-large engineering
orgs on big customer-facing projects, you usually kind of move
on from it. Early in my career I learned half a dozen
programming languages and prided myself on various arcane arts
like metaprogramming tricks. But after a while you learn that
one person's clever solution is another person's
maintainability nightmare, and maybe being as boring and
predictable and direct as possible in the code (if slightly
more verbose) would have been better. I've maintained some
systems written by very brilliant programmers who were just
being too clever by half.
You also come to realize that coding skills and language choice
don't matter as much as you thought, and the big issues in
engineering are 1) are you solving the right problem to begin
with 2) people/communication/team dynamics 3) systems
architecture, in that order of importance.
And also, programming just gets a little repetitive after a
while. Like you say, after a decade or so, it feels a bit like
"more of the same." That goes especially for most of the
programming most of us are doing most of the time in our day
jobs. We don't write a lot of fancy algorithms, maybe once in a
blue moon and even then you're usually better off with a
library. We do CRUD apps and cookie-cutter React pages and so
on and so on.
If AI coding agents fall into your lap once you've reached that
particular variation of a mature stage in your engineering
career, you probably welcome them as a huge time saver and a
means to solve problems you care about faster. After a decade,
I still love engineering, but there aren't may coding tasks I
particularly relish diving into. I can usually vaguely picture
the shape of the solution in my head out the gate, and actually
sitting down and doing it feels rather a bore and just a lot of
typing and details. Which is why it's so nice when I can kick
off a Claude session to do it instead, and review the results
to see if they match what I had in mind.
Don't get me wrong. I still love programming if there's just
the right kind of compelling puzzle to solve (rarer and rarer
these days), and I still pride myself on being able to do it
well. Come the holidays I will be working through Advent of
Code with no AI assistance whatsoever, just me and vim. But
when January rolls around and the day job returns I'll be
having Claude do all the heavy lifting once again.
skydhash wrote 2 hours 30 min ago:
I'm guessing, but I'm pretty sure you're dealing with big
balls of mud which has dampened your love of coding. Where
implementing something is more about solving accidental
complexity and dealing with technical debts than actually
doing the job.
libraryofbabel wrote 1 hour 46 min ago:
I've seen some balls of mud, sure, but I don't think that's
the essence of it. It's more like:
1) When I already have a rough picture of the solution to
some programming task in my head up front, I do not
particularly look forward to actually going and doing it.
I've done enough programming that many things feel like a
variation on something I've done before. Sometimes the task
is its own reward because there is a sufficiently hard and
novel puzzle to solve. Mostly it is not and it's just a
matter of putting in the time. Having Claude do most of the
work is perfect in those cases. I don't think this is
particularly anything to do with working on a ball of mud:
it applies to most kinds of work on clean well-architected
projects as well.
2) I have a restless mind and I just don't find doing
something that interesting anymore once I have more or less
mastered it. I'd prefer to be learning some new field
(currently, LLMs) rather than spending a lot of time doing
something I already know how to do. This is a matter of
temperament: there is nothing wrong with being content in
doing a job you've mastered. It's just not me.
skydhash wrote 19 min ago:
> 1) When I already have a rough picture of the solution
to some programming task in my head up front, I do not
particularly look forward to actually going and doing it.
Every time I think I have a rough picture of some
solution, there's always something in the implementation
that proves me wrong. Then it's reading docs and figuring
whatever gotchas I've stepped into. Or where I erred in
understanding the specifications. If something is that
repetitive, I refactor and try to make it simple.
> I have a restless mind and I just don't find doing
something that interesting anymore once I have more or
less mastered it.
If I've mastered something (And I don't believe I've done
so for pretty much anything), the next step is always
about eliminating the tedium of interacting with that
thing. Like a code generator for some framework or adding
special commands to your editor for faster interaction
with a project.
ben_w wrote 6 hours 12 min ago:
> > I think we have different opinions on what's fun and what's
boring!
> You've really hit the crux of the problem and why so many
people have differing opinions about AI coding.
Part of it perhaps, but there's also a huge variation in model
output. I've been getting some surprisingly bad generations
from ChatGPT recently, though I'm not sure if that's ChatGPT
getting worse or me getting used to a much higher quality of
code from Claude Code which seems to test itself before saying
"done". I have no idea if my opinion will flip again now 5.2 is
out.
And some people are bad communicators, an important skill for
LLMs, though few will recognise it because everyone knows what
they themselves meant by whatever words they use.
And some people are bad planners, likewise an important skill
for breaking apart big tasks that LLMs can't do into small ones
they can do.
danielmarkbruce wrote 37 min ago:
This isn't just in coding. My goodness the stuff I see people
write into an LLM and then say "see! It's stupid!". Some
people are naturally good at prompting and some people just
are not. The differences in output are dramatic.
breuleux wrote 7 hours 18 min ago:
I think it ultimately comes down to whether you care more about
the what, or more about the how. A lot of coders love the
craft: making code that is elegant, terse, extensible,
maintainable, efficient and/or provably correct, and so on.
These are the kind of people who write programming languages,
database engines, web frameworks, operating systems, or small
but nifty utilities. They don't want to simply solve a problem,
they want to solve a problem in the "best" possible way
(sometimes at the expense of the problem itself).
It's typically been productive to care about the how, because
it leads to better maintainability and a better ability to
adapt or pivot to new problems. I suppose that's getting less
true by the minute, though.
doug_durham wrote 6 hours 15 min ago:
Crafting code can be self-indulgent since most common
patterns have been implemented multiple times in multiple
languages. A lot of time the craft oriented developer will
reject an existing implementation because it doesn't match
their sensibilities. There is absolutely a role for craft,
however the amount of craft truly needed in modern
development is not as large as people would like. There are
lots of well crafted libraries and frameworks that can be
adopted if you are willing to accommodate their world view.
breuleux wrote 5 hours 33 min ago:
As someone who does that a lot... I agree. Self-indulgent
is the word. It just feels great when the implementation is
a perfect fit for your brain, but sometimes that's just not
a good use of your time.
Sometimes, you strike gold, so there's that.
sfn42 wrote 4 hours 11 min ago:
I kind of struggle with this. I basically hate everyone
elses code, and by that I mean I hate most people's code.
A lot of people write awesome code but most people write
what I'd call trash code.
And I do think there's more to it than preference. Like
there's actual bugs in the code, it's confusing and
because it's confusing there's more bugs. It's solving a
simple problem but doing so in an unnecessarily
convoluted way. I can solve the same problem in a much
simpler way. But because everything is like this I can't
just fix it, there's layers and layers of this
convolution that can't just be fixed and of course
there's no proper decoupling etc so a refactor is kind of
all or nothing. If you start it's like pulling on a
thread and everything just unravels.
This is going to sound pompous and terrible but honestly
some times I feel like I'm too much better than other
developers. I have a hard time collaborating because the
only thing I really want to do with other people's code
is delete it and rewrite it. I can't fix it because it
isn't fixable, it's just trash. I wish they would have
talked to me before writing it, I could have helped then.
Obviously in order to function in a professional
environment i have to suppress this stuff and just let
the code be ass but it really irks me. Especially if I
need to build on something someone else made - itsalmost
always ass, I don't want to build on a crooked
foundation. I want to fix the foundation so the rest of
the building can be good too. But there's no time and
it's exhausting fixing everyone else's messes all the
time.
pdntspa wrote 1 hour 15 min ago:
I feel this too. And it seems like the very worst code
always seems to come from the people that seem the
smartest, otherwise. I've worked for a couple of people
that are either ACM alum and/or have their own
wikipedia page, multiple patents to their name and
leaders in business, and beyond anyone else that I have
ever worked with, their code has been the worst.
Which is part of what I find so motivating with AI. It
is much better at making sense of that muck, and with
some guidance it can churn out code very quickly with a
high degree of readability.
danielmarkbruce wrote 48 min ago:
did you ever consider their code was good and it's
you that is the problem?
gmueckl wrote 2 hours 53 min ago:
I can guarantee you that if you were to write a
completely new program and continued to work on it for
more than 5 years, you'd feel the same things about
your own code eventually. It's just unavoidable at some
point. The only thing left then is degrees badness.
And nothing is more humbling than realizing that the
only person that got you there is yourself.
KronisLV wrote 3 hours 19 min ago:
Iâve linked this before, but I feel like this might
resonate with you:
(HTM) [1]: https://www.stilldrinking.org/programming-suck...
sfn42 wrote 2 hours 56 min ago:
I enjoyed that but honestly it kind of doesn't really
resonate. Because it's like "This stuff is really
complicated and nobody knows how anything works etc
and that's why everything is shit".
I'm talking about simple stuff that people just can't
do right. Not complex stuff. Like imagine some
perfect little example code on the react docs or
whatever, good code. Exemplary code. Trivial code
that does a simple little thing. Now imagine some
idiot wrote code to do exactly the same thing but
made it 8 times longer and incredibly convoluted for
absolutely no reason and that's basically what most
"developers" do. Everyone's a bunch of stupid
amateurs who can't do simple stuff right, that's my
problem. It's not understandable, it's not
justifiable, it's not trading off quality for speed.
It's stupidity, ignorance and lazyness.
That's why we have coding interviews that are
basically "write fizzbuzz while we watch" and when I
solve their trivial task easily everyone acts like
I'm Jesus because most of my peers can't fucking
code. Like literally I have colleagues with years of
experience who are barely at a first year CS level.
They don't know the basics of the language they've
been working with for years. They're amateurs.
KronisLV wrote 2 hours 45 min ago:
Then itâs quite possible that youâre working in
an environment that naturally leads to people like
that getting hired. If thatâs something you see
repeatedly, then the environment isnât a good fit
for you and you arenât a good fit for it. So
youâd be better served by finding a place where
the standards are as high as you want, from the
very first moment in the hiring process.
For example, Oxide Computers has a really
interesting approach [1] Obviously thatâs easier
said than done but there are quite a few orgs out
there like that. If everyone around you doesnât
care about something or canât do it, itâs
probably a systemic problem with the environment.
(HTM) [1]: https://oxide.computer/careers
agumonkey wrote 7 hours 32 min ago:
it's true that 'code' doesn't mean much, but the ability to
manage different layers, states to produce logic modules was
the challenge
getting things solved entirely feels very very numbing to me
even when gemini or chatgpt solves it well, and even beyond
what i'd imagine.. i feel a sense of loss
pdntspa wrote 9 hours 8 min ago:
You are hitting the nail on the head. We are not being hired to
write code. We are being hired to solve problems. Code is
simply the medium.
agumonkey wrote 7 hours 30 min ago:
but do you solve the problem if you just slap a prompt and
iterate while the LLM gathers diffs ?
pdntspa wrote 5 hours 52 min ago:
If the client is happy, the code is well-formed, and it
solves their problem is a cost-effective manner, what is
not to like?
agumonkey wrote 5 hours 17 min ago:
cause the 'dev' didn't solve anything
ultimately i wonder how long people will need devs at all
if you can all prompt your wishes
some will be kept to fix the occasional hallucination and
that's it
ben_w wrote 6 hours 15 min ago:
Depends what the problem is.
Sometimes you can, sometimes you have to break the problem
apart and get the LLM to do each bit separately, sometimes
the LLM goes funny and you need to solve it yourself.
Customers don't want you wasting money doing by hand what
can be automated, nor do they want you ripping them off by
blindly handing over unchecked LLM output when it can't be
automated.
agumonkey wrote 5 hours 15 min ago:
there are other ways: being scammed by lazy devs using AI
to produce what devs normally do and not saving any money
for the customer. i mentioned it in another thread, i
heard first hand people say "i will never report how much
time savings i get from gemini, at best i'll say 1 day a
month"
eclipxe wrote 6 hours 37 min ago:
Yes?
wahnfrieden wrote 8 hours 37 min ago:
I believe wage work has a significant factor in all this.
Most are not paid for results, they're paid for time at desk
and regular responsibilities such as making commits,
delivering status updates, code reviews, etc. - the daily
activities of work are monitored more closely than the
output. Most ESOP grant such little equity that working
harder could never observably drive an increase in its value.
Getting a project done faster just means another project to
begin sooner.
Naturally workers will begin to prefer the motions of the
work they find satisfying more than the result it has for the
business's bottom line, from which they're alienated.
order-matters wrote 5 hours 42 min ago:
I think it's related. The nature of the wage work likely
also self-selects for people who simply enjoy coding and
being removed from the bigger picture problems they are
solving.
Im on the side of only enjoy coding to solve problems and i
skipped software engineering and coding for work explicitly
because i did not want to participate in that dynamic of
being removed from the problems. instead i went into
business analytics, and now that AI is gaining traction I
am able to do more of what I love - improving processes and
automation - without ever really needing to "pay dues"
doing grunt work I never cared to be skilled at in the
first place unless it was necessary.
Sammi wrote 5 hours 56 min ago:
> Naturally workers will begin to prefer the motions of the
work they find satisfying more than the result it has for
the business's bottom line, from which they're alienated.
Wow. I've read a lot of hacker news this past decade, but
I've never seen this articulated so well before. You really
lifted the veil for me here. I see this everywhere, people
thinking the work is the point, but I haven't been able to
crystallize my thoughts about it like you did just now.
thenewwazoo wrote 3 hours 2 min ago:
Marx had a lot of good ideas, though you wouldn't know it
by listening to capitalist-controlled institutions.
(HTM) [1]: https://en.wikipedia.org/wiki/Marx%27s_theory_of...
embedding-shape wrote 9 hours 29 min ago:
Some people are into designing software, others like to put the
design into implementation, others like cleaning up
implementations yet others like making functional software
faster.
There is enough work for all of us to be handsomely paid while
having fun doing it :) Just find what you like, and work with
others who like other stuff, and you'll get through even the
worst of problems.
For me the fun comes not from the action of typing stuff with my
sausage fingers and seeing characters end up on the screen, but
basically everything before that and after that. So if I can make
"translate what's in my head into source on disk something can
run" faster, that's a win in my book, but not if the quality
degrades too much, so tight control over it still not having to
use my fingers to actually type.
mkehrt wrote 8 hours 30 min ago:
I've found that good tab AI-based tab completion is the sweet
spot for me. I am still writing code, but I don't have to type
all of it if it's obvious.
OkayPhysicist wrote 5 hours 17 min ago:
This has been my approach, as well. I've got a neovim setup
where I can 1) open up a new buffer, ask a question, and then
copy/paste from it and 2) prompt the remainder of the line,
function, or class. (the latter two are commands I run,
rather than keybinds).
AStrangeMorrow wrote 9 hours 33 min ago:
I really enjoy writing some of the code. But some is a pain.
Never have fun when the HQ team asks for API changes for the 5th
time this month. Or for that matter writing the 2000 lines of
input and output data validation in the first place. Or
refactoring that ugly dictionary passed all over the place to be
a proper class/dataclass. Handling config changes. Lots of that
piping job.
Some tasks I do enjoy coding. Once in the flow it can be quite
relaxing.
But mostly I enjoy the problem solving part: coming up with the
right algorithm, a nice architecture , the proper set of metrics
to analyze etc
moffkalast wrote 9 hours 56 min ago:
He's a real straight shooter with upper management written all
over him.
SoftTalker wrote 8 hours 40 min ago:
Ummm, yeah... Iâm gonna have to go ahead and sort of disagree
with you there.
wpasc wrote 9 hours 43 min ago:
but what would you say... you do here?
fudged71 wrote 10 hours 36 min ago:
This tells me that we need to build 1000 more linters of all kinds
xnorswap wrote 10 hours 24 min ago:
Unironically I agree.
One under-discussed lever that senior / principal engineers can
pull is the ability to write linters & analyzers that will stop
junior engineers ( or LLMs ) from doing something stupid that's
specific to your domain.
Let's say you don't want people to make async calls while owning a
particular global resource, it only takes a few minutes to write an
analyzer that will prevent anyone from doing so.
Avoid hours of back-and-forth over code review by encoding your
preferences and taste into your build pipeline and stop it at
source.
jmalicki wrote 10 hours 14 min ago:
And for more complex linters I find that it can be easy to get
the LLM to write most of it itself!!!
james_marks wrote 10 hours 47 min ago:
This is a key part of the AI love/hate flame war.
Very easy to write it off when it spins out on the open-ended
problems, without seeing just how effective it can be once you zoom
in.
Of course, zooming in that far gives back some of the promised gains.
Edit: typo
hombre_fatal wrote 10 hours 35 min ago:
Go one level up:
claude2() {
claude "$(claude "Generate a prompt and TODO list that works
towards this goal: $*" -p)"
}
$ claude2 pls give ranked ideas for make code better
thewebguyd wrote 10 hours 37 min ago:
> without seeing just how effective it can be once you zoom in.
The love/hate flame war continues because the LLM companies aren't
selling you on this. The hype is all about "this tech will enable
non-experts to do things they couldn't do before" not "this tech
will help already existing experts with their specific niche,"
hence the disconnect between the sales hype and reality.
If OpenAI, Anthropic, Google, etc. were all honest and tempered
their own hype and misleading marketing, I doubt there would even
be a flame war. The marketing hype is "this will replace employees"
without the required fine print of "this tool still needs to be
operated by an expert in the field and not your average non
technical manager."
hombre_fatal wrote 10 hours 26 min ago:
The amount of GUIs I've vibe-coded works against your claim.
As we speak, my macOS menubar has an iStat Menus replacement, a
Wispr Flow replacement (global hotkey for speech-to-text), and a
logs visualizer for the `blocky` dns filtering program -- all of
which I built without reading code aside from where I was
curious.
It was so vibe-coded that there was no reason to use SwiftUI nor
set them up in Xcode -- just AppKit Swift files compiled into
macOS apps when I nix rebuild.
The only effort it required was the energy to QA the LLM's
progress and tell it where to improve, maybe click and drag a
screenshot into claude code chat if I'm feeling excessive.
Where do my 20 years of software dev experience fit into this
except beyond imparting my aesthetic preferences?
In fact, insisting that you write code yourself is becoming a
liability in an interesting way: you're going to make trade-offs
for DX that the LLM doesn't have to make, like when you use
Python or Electron when the LLM can bypass those abstractions
that only exist for human brains.
bopbopbop7 wrote 10 hours 9 min ago:
You making a couple of small GUIs that could have been made
with a drag and drop editor 10 years ago doesn't work against
his claim as much as you think. You're just telling on your
self and your "20 years" of supposed dev experience.
hombre_fatal wrote 9 hours 58 min ago:
Dragging UI components into a WYSIWYG editor is <1% of
building an app.
Else Visual Basic and Dreamweaver would have killed software
engineering in the 90s.
Also, I didn't make them. A clanker did. I can see this topic
brings out the claws. Honestly I used to have the same
reaction, and in a large way I still hate it.
bopbopbop7 wrote 9 hours 47 min ago:
It's not bringing out claws, it's just causing certain
developers to out themselves.
hombre_fatal wrote 8 hours 57 min ago:
Outs me as what, exactly?
I'm not sure you're interacting with single claim I've
made so far.
onethought wrote 10 hours 16 min ago:
Love that you are disagreeing with parent by saying you built
software all on your own, and you only had 20 years software
experience.
Isn't that the point they are making?
hombre_fatal wrote 10 hours 15 min ago:
Maybe I didn't make it clear, but I didn't build the software
in my comment. A clanker did.
Vibe-coding is a claude code <-> QA loop on the end result
that anyone can do (the non-experts in his claim).
An example of a cycle looks like "now add an Options tab that
let's me customize the global hotkey" where I'm only an
end-user.
Once again, where do my 20 years of software experience come
up in a process where I don't even read code?
thewebguyd wrote 9 hours 32 min ago:
> An example of a cycle looks like "now add an Options tab
that let's me customize the global hotkey" where I'm only
an end-user
Which is a prompt that someone with experience would write.
Your average, non-technical person isn't going to prompt
something like that, they are going to say "make it so I
can change the settings" or something else super vague and
struggle. We all know how difficult it is to define
software requirements.
Just because an LLM wrote the actual code doesn't mean your
prompts weren't more effective because of your experience
and expertise in building software.
Sit someone down in front of an LLM with zero development
or UI experience at all and they will get very different
results. Chances are they won't even specify "macOS menu
bar app" in the prompt and the LLM will end up trying to
make them a webapp.
Your vibe coding experience just proves my initial point,
that these tools are useful for those who already have
experience and can lean on that to craft effective prompts.
Someone non-technical isn't going to make effective use of
an LLM to make software.
hombre_fatal wrote 8 hours 38 min ago:
Counter point: [1] Your original claim:
> The hype is all about "this tech will enable
non-experts to do things they couldn't do before"
Are you saying that a prompt like "make a macOS weather
app for me" and "make an options menu that lets me set my
location" are only something an expert can do?
I need to know what you think their expertise is in.
(HTM) [1]: https://news.ycombinator.com/item?id=46234943
ModernMech wrote 9 hours 16 min ago:
Here's how I look at it as a roboticist:
The LLM prompt space is an ND space where you can start
at any point, and then the LLM carves a path through the
space for so many tokens using the instructions you
provided, until it stops and asks for another direction.
This frames LLM prompt coding as a sort of navigation
task.
The problem is difficult because at every decision point,
there's an infinite number of things you could say that
could lead to better or worse results in the future.
Think of a robot going down the sidewalk. It controls
itself autonomously, but it stops at every intersection
and asks "where to next boss?" You can tell it either to
cross the street, or drive directly into traffic, or do
any number of other things that could cause it to get
closer to its destination, further away, or even to
obliterate itself.
In the concrete world, it's easy to direct this robot,
and to direct it such that it avoids bad outcomes, and to
see that it's achieving good outcomes -- it's physically
getting closer to the destination.
But when prompting in an abstract sense, its hard to see
where the robot is going unless you're an expert in that
abstract field. As an expert, you know the right way to
go is across the street. As a novice, you might tell the
LLM to just drive into traffic, and it will happily
oblige.
The other problem is feedback. When you direct the
physical robot to drive into traffic, you witness its
demise, its fate is catastrophic, and if you didn't
realize it before, you'd see the danger then. The robot
also becomes incapacitated, and it can't report falsely
about its continued progress.
But in the abstract case, the LLM isn't obliterated, it
continues to report on progress that isn't real, and as a
non expert, you can't tell its been flattened into a
pancake. The whole output chain is now completely and
thoroughly off the rails, but you can't see the
smoldering ruins of your navigation instructions because
it's told you "Exactly, you're absolutely right!"
onethought wrote 10 hours 11 min ago:
But anyone didn't do it... you an expert in software
development did it.
I would hazard a guess that your knowledge lead to better
prompts, better approach... heck even understanding how to
build a status bar menu on Mac OS is slightly expert
knowledge.
You are illustrating the GP's point, not negating it.
hombre_fatal wrote 10 hours 4 min ago:
> I would hazard a guess that your knowledge lead to
better prompts, better approach... heck even
understanding how to build a status bar menu on Mac OS is
slightly expert knowledge.
You're imagining that I'm giving Claude technical advice,
but that is the point I'm trying to make: I am not.
This is what "vibe-coding" tries to specify.
I am only giving Claude UX feedback from using the app it
makes. "Add a dropdown that lets me change the girth".
Now, I do have a natural taste for UX as a software user,
and through that I can drive Claude to make a pretty good
app. But my software engineering skills are not
utilized... except for that one time I told Claude to use
an AGDT because I fancy them.
ModernMech wrote 9 hours 32 min ago:
My mother wouldn't be able to do what you did. She
wouldn't even know where to start despite using LLMs
all the time. Half of my CS students wouldn't know
where to start either. None of my freshman would. My
grad students can do this but not all of them.
Your 20 years is assisting you in ways you don't know;
you're so experienced you don't know what it means to
be inexperienced anymore. Now, it's true you probably
don't need 20 years to do what you did, but you need
some experience. Its not that the task you posed to the
LLM is trivial for everyone due to the LLM, its that
its trivial for you because you have 20 years
experience. For people with experience, the LLM makes
moderate tasks trivial, hard tasks moderate, and
impossible tasks technically doable.
For example, my MS students can vibe code a UI, but
they can't vibe code a complete bytecode compiler. They
can use AI to assist them, but it's not a trivial task
at all, they will have to spend a lot of time on it,
and if they don't have the background knowledge they
will end up mired.
hombre_fatal wrote 8 hours 40 min ago:
The person at the top of the thread only made a claim
about "non-experts".
Your mom wouldn't vibe-code software that she wants
not because she's not a software engineer, but
because she doesn't engage with software as a user at
the level where she cares to do that.
Consider these two vibe-coded examples of waybar apps
in r/omarchy where the OP admits he has zero software
experience:
- Weather app: [1] - Activity monitor app: [2] That
is a direct refutation of OP's claim. LLM enabled a
non-expert to build something they couldn't before.
Unless you too think there exists a necessary
expertise in coming up with these prompts:
- "I want a menubar app that shows me the current
weather"
- "Now make it show weather in my current location"
- "Color the temperatures based on hot vs cold"
- "It's broken please find out why"
Is "menubar" too much expertise for you? I just asked
claude "what is that bar at the top of my screen with
all the icons" and it told me that it's macOS'
menubar.
(HTM) [1]: https://www.reddit.com/r/waybar/comments/1p6...
(HTM) [2]: https://www.reddit.com/r/omarchy/comments/1p...
ModernMech wrote 6 hours 57 min ago:
I didn't make clear I was responding to your
question:
"Where do my 20 years of software dev experience
fit into this except beyond imparting my aesthetic
preferences?"
Anyway, I think you kind of unintentionally proved
my point. These two examples are pretty trivial as
far as software goes, and it enabled someone with a
little technical experience to implement them where
before they couldn't have.
They work well because:
a) the full implementation for these apps don't
even fill up the AI context window. It's easy to
keep the LLM on task.
b) it's a tutorial style-app that people often
write as "babby's first UI widget", so there are
thousands of examples of exactly this kind of thing
online; therefore the LLM has little trouble
summoning the correct code in its entirety.
But still, someone with zero technical experience
is going to be immediately thwarted by the prompts
you provided.
Take the first one "I want a menubar app that shows
me the current weather". [1] ChatGPT response:
"Nice â here's a ready-to-run macOS menubar app
you can drop into Xcode..."
She's already out of her depth by word 11. You
expect your mom to use Xcode? Mine certainly can't.
Even I have trouble with Xcode and I use it for
work. Almost every single word in that response
would need to be explained to her, it might as well
be a foreign language.
Now, the LLM could help explain it to her, and
that's what's great about them. But by the time she
knows enough to actually find the original response
actionable, she would have gained... knowledge and
experience enough to operate it just to the level
of writing that particular weather app. Though
having done that, it's still unreasonable to now
believe she could then use the LLM to write a
bytecode compiler, because other people who have a
Ph.D. in CS can. The LLM doesn't level the playing
field, it's still lopsided toward the Ph.D.s /
senior devs with 20 years exp.
(HTM) [1]: https://chatgpt.com/share/693b20ac-dcec-80...
bopbopbop7 wrote 8 hours 5 min ago:
Your best examples of non-experts are two Linux
power users?
kccqzy wrote 10 hours 50 min ago:
Not at all my experience. Iâve often tried things like telling
Claude this SIMD code I wrote performed poorly and I needed some
ideas to make it go faster. Claude usually does a good job rewriting
the SIMD to use different and faster operations.
mainmailman wrote 10 hours 33 min ago:
I'm not a C++ programmer, but wouldn't your example be a fairly
structured problem? You wanted to improve performance of a specific
part of your code base.
zahlman wrote 10 hours 35 min ago:
That sounds like a pretty "structured" problem to me.
kccqzy wrote 9 hours 53 min ago:
Performance optimization isnât structured at all. I find it
amazing that without access to profilers or anything Claude is
able to respond to âanything I can do to improve the speedâ
with acceptable results.
chrneu wrote 10 hours 33 min ago:
that's one of the problems with AI. as it can accomplish more
tasks people will overestimate it's ability.
what the person you replied to had claude do is relatively simple
and structured, but to that person what claude did is
"automagic".
People already vastly overestimate AI's capabilities. This
contributes to that.
plufz wrote 10 hours 50 min ago:
I think slash commands are great to help Claude with this. I have
many like /code:dry /code:clean-code etc that has a semi long prompt
and references to longer docs to review code from a specific
perspective. I think it atleast improves Claude a bit in this area.
Like processes or templates for thinking in broader ways. But yes I
agree it struggles a lot in this area.
airstrike wrote 10 hours 41 min ago:
Somewhat tangential but interestingly I'd hate for Claude to make
any changes with the intent of sticking to "DRY" or "Clean Code".
Neither of those are things I follow, and either way design is
better informed by the specific problems that need to be solved
rather than by such general, prescriptive principles.
plufz wrote 7 hours 29 min ago:
I agree, so obviously I direct it with more info and point it to
code that I believe needs more of specific principles. But
generally I would like Claude to produce more DRY code, it is
great at reimplementing the same thing in five places instead of
making a shared utility module.
airstrike wrote 6 hours 52 min ago:
I see, and I definitely agree with that last statement. It
tends to rewrite stuff. I feel like it should pay me back
10,000 tokens each time it increases the API surface
SketchySeaBeast wrote 9 hours 50 min ago:
I'm not sure how to interpret someone saying they don't follow
DRY. Do you meant taking it to the Zealous extreme, or do you
abhor helper functions? Is this a "No True Scottsman" thing?
airstrike wrote 7 hours 59 min ago:
I just think DRY is overblown. I just let code grow. When parts
of it become obvious to abstract, I refactor them into
something self contained. I learned this from an ice wizard.
When I was younger, writing Python rather than Rust, I used to
go out of my way to make everything DRY, DRY, DRY everywhere
from the outset. Class-based views in Django come to mind.
Today, I just write code, and after it's working I go back and
clean things up where applicable. Not because I'm "following a
principle", but because it's what makes sense in that specific
instance.
Pannoniae wrote 9 hours 5 min ago:
Not GP but I can strongly relate to it. Most of the programming
I do is related to me making a game.
I follow WET principles (write everything twice at least)
because the abstraction penalty is huge, both in terms of
performance and design, a bad abstraction causes all subsequent
content to be made much slower. Which I can't afford as a small
developer.
Same with most other "clean code" principles. My codebase is
~70K LoC right now, and I can keep most of it in my head. I
used to try to make more functional, more isolated and
encapsulated code, but it was hard to work with and most
importantly, hard to modify. I replaced most of it with global
variables, shit works so much better.
I do use partial classes pretty heavily though - helps LLMs not
go batshit insane from context overload whenever they try to
read "the entire file".
Models sometimes try to institute these clean code practices
but it almost always just makes things worse.
SketchySeaBeast wrote 8 hours 16 min ago:
OK, I can follow WET before you DRY, to me that's just a
non-zealous version of Don't Repeat Yourself.
I think, if you're writing code where you know the entire
code base, a lot of the clean principles seem less important,
but once you get someone who doesn't, and that can be you
coming back to the project in three months, suddenly they
have value.
maddmann wrote 11 hours 2 min ago:
lol 5000 tests. Agentic code tools have a significant bias to add
versus remove/condense. This leads to a lot of bloat and orphaned code.
Definitely something that still needs to be solved for by agentic
tools.
nosianu wrote 10 hours 17 min ago:
> Agentic code tools have a significant bias to add versus
remove/condense.
Your point stands uncontested by me, but I just wanted to mention
that humans have that bias too.
Random link (has the Nature study link): [1]
(HTM) [1]: https://blog.benchsci.com/this-newly-proven-human-bias-cause...
(HTM) [2]: https://en.wikipedia.org/wiki/Additive_bias
maddmann wrote 7 hours 51 min ago:
Great point, interesting how agents somehow pick up the same bias.
oofbey wrote 10 hours 32 min ago:
Oh Iâve had agents remove tests plenty of times. Or cripple the
tests so they pass but are useless - more common and harder to prompt
against.
maddmann wrote 7 hours 48 min ago:
Ah true, that also can happen â in aggregate I think models will
tend to expand codebases versus contract. Though, this is anecdotal
and probably is something ai labs and coding agent companies are
looking at now.
oofbey wrote 5 hours 16 min ago:
Itâs the same bias for action which makes them code up a change
when you genuinely are just asking a question about something.
They really want to write code.
f311a wrote 11 hours 6 min ago:
I like to ask LLMs to find problems o improvements in 1-2 files. They
are pretty good at finding bugs, but for general code improvements,
50-60% edits are trash. They add completely unnecessary stuff. If you
ask them to improve a pretty well-written code, they rarely say it's
good enough already.
For example, in a functional-style codebase, they will try to rewrite
everything to a class. I have to adjust the prompt to list things that
I'm not interested in. And some inexperienced people are trying to
write better code by learning from such changes of LLMs...
ryandrake wrote 9 hours 57 min ago:
I asked Claude the other day to look at one of my hobby projects that
has a client/server architecture and a bespoke network protocol, and
brainstorm ideas for converting it over to HTTP, JSON-RPC, or
something else standards-based. I specifically told it to "go wild"
and really explore the space. It thought for a while and provided a
decent number of suggestions (several I was unaware of) with
"verdicts". Ultimately, though, it concluded that none of them were
ideal, and that the custom wire protocol was fine and appropriate for
the project. I was kind of shocked at this conclusion: I expected it
to behave like that eager intern persona we all have come to
expect--ready to rip up the code and "do things."
pawelduda wrote 10 hours 45 min ago:
If you just ask it to find problems, it will do its best to find them
- like running a while loop with no return condition. That's why I
put some breaker in the prompt, which in this case would be "don't
make any improvements if the positive impact is marginal". I've
mostly seen it do nothing and just summarize why, followed by some
suggestions in case I still want to force the issue
f311a wrote 10 hours 37 min ago:
I guess "marginal impact" for them is a pretty random metric, which
will be different on each run. Will try it next time.
Another problem is that they try to add handling of different cases
that are never present in my data. I have to mention that there is
no need to update handling to be more generalized. For example, my
code handles PNG files, and they add JPG handling that never
happens.
websiteapi wrote 11 hours 9 min ago:
you gotta be strategic about it. so for example for tests, tell it to
use equivalence testing and to prove it, e.g. create a graph of
permutations of arguments and their equivalences from the underlying
code, and then use such thing to generate the tests.
telling it to do better without any feedback obviously is going to go
nowhere fast.
m101 wrote 11 hours 15 min ago:
This is a great example of there being no intelligence under the hood.
Terretta wrote 10 hours 21 min ago:
Just as enterprise software is proof positive of no intelligence
under the hood.
I don't mean the code producers, I mean the enterprise itself is not
intelligent yet it (the enterprise) is described as developing the
software. And it behaves exactly like this, right down to deeply
enjoying inflicting bad development/software metrics (aka BD/SM) on
itself, inevitably resulting in:
(HTM) [1]: https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...
xixixao wrote 11 hours 5 min ago:
Would a human perform very differently? A human who must obey orders
(like maybe they are paid to follow the prompt). With some "magnitude
of work" enforced at each step.
I'm not sure there's much to learn here, besides it's kinda fun,
since no real human was forced to suffer through this exercise on the
implementor side.
Yeask wrote 3 hours 8 min ago:
A human trained with 0.00000001% of the money OpenAi uses to train
models will perform better.
A human with no traning will perform worse.
nosianu wrote 10 hours 25 min ago:
> Would a human perform very differently?
How useful is the comparison with the worst human results? Which
are often due to process rather than the people involved.
You can improve processes and teach the humans. The junior will
become a senior, in time. If the processes and the company are bad,
what's the point of using such a context to compare human and AI
outputs? The context is too random and unpredictable. Even if you
find out AI or some humans are better in such a bad context, what
of it? The priority would be to improve the process first for best
gains.
thatwasunusual wrote 10 hours 31 min ago:
No (human) developer would _add_ tests. ^/s
Capricorn2481 wrote 10 hours 52 min ago:
> Would a human perform very differently?
Yes.
wongarsu wrote 10 hours 55 min ago:
> A human who must obey orders (like maybe they are paid to follow
the prompt). With some "magnitude of work" enforced at each step
Which describes a lot of outsourced development. And we all know
how well that works
theshrike79 wrote 6 hours 0 min ago:
Using outsourced coders is a skill like any other. There are
cultural things you need to consider etc.
It's not hard, just different.
kderbyma wrote 2 days ago:
Yeah. I noticed Claud suffers when it reaches context overload - its
too opinionated, so it shortens its own context with decisions I would
not ever make, yet I see it telling itself that the shortcuts are a
good idea because the project is complex...then it gets into a loop
where it second guesses its own decisions and forgets the context and
then continues to spiral uncontrollably into deeper and deeper failures
- often missing the obvious glitch and instead looking into imaginary
land for answers - constantly diverting the solution from patching to
completely rewriting...
I think it suffers from performance anxiety...
----
The only solution I have found is to - rewrite the prompt from scratch,
change the context myself, and then clear any "history or memories" and
then try again.
I have even gone so far as to open nested folders in separate windows
to "lock in" scope better.
As soon as I see the agent say "Wait, that doesnt make sense, let me
review the code again" its cooked
rtp4me wrote 10 hours 51 min ago:
For me, too many compactions throughout the day eventually lead to a
decline in Claude's thinking ability. And, during that time, I have
given it so much context to help drive the coding interaction. Thus,
restarting Claude requires me to remember the small bits of "nuggets"
we discovered during the last session so I find myself repeating the
same things every day (my server IP is: xxx, my client IP is: yyy,
the code should live in directory: a/b/c). Using the resume feature
with Claude simply brings back the same decline in thinking that led
me to stop it in the first place. I am sure there is a better way to
remember these nuggets between sessions but I have not found it yet.
snarf21 wrote 10 hours 58 min ago:
That has been my greatest stumbling block with these AI agents:
context. I was trying to have one help vibe code a puzzle game and
most of the time I added a new rule it broke 5 existing rules. It
also never approached the rules engine with a context of building a
reusable abstraction, just Hammer meet Nail.
flowerthoughts wrote 11 hours 3 min ago:
There's no -c on the command line, so I'm guessing this is starting
fresh every iteration, unless claude(1) has changed the default
lately.
embedding-shape wrote 11 hours 4 min ago:
> Yeah. I noticed Claud suffers when it reaches context overload
All LLMs degrade in quality as soon as you go beyond one user message
and one assistant response. If you're looking for accuracy and
highest possible quality, you need to constantly redo the
conversations from scratch, never go beyond one user message.
If the LLM gets it wrong in their first response, instead of saying
"No, what I meant was...", you need to edit your first response, and
re-generate, otherwise the conversation becomes "poisoned" almost
immediately, and every token generated after that will suffer.
torginus wrote 10 hours 2 min ago:
Yeah, I used to write some fiction for myself with LLMs as a
recreational pasttime, it's funny to see how as the story gets
longer, LLMs progressively either get dumber, start repeating
themselves, or become unhinged.
someguyiguess wrote 11 hours 9 min ago:
Thereâs definitely a certain point I reach when using Claude code
where I have to make the specifications so specific that it becomes
more work than just writing the code myself
SV_BubbleTime wrote 11 hours 15 min ago:
Iâm keeping Claudeâs tasks small and focused, then if I can I
clear between.
Itâs REAL FUCKING TEMPTING to say âhey Claude, go do this thing
that would take me hours and you secondsâ because he will happily,
and itâll kinda work. But one way or another you are going to put
those hours in.
Itâs like programming⦠is proof of work.
thevillagechief wrote 11 hours 9 min ago:
Yes, this is exactly true. You will put in those hours.
whatshisface wrote 10 hours 40 min ago:
In this vein, one of the biggest time-savers has turned out to be
its ability to make me realize I don't want to do something.
SV_BubbleTime wrote 7 hours 55 min ago:
I get that. But I think the AI-deriders are a bit nuts
sometimes because while Iâm not running around crying about
AGI⦠itâs really damn nice to change the arguments of a
function and have it just go everywhere and adjust every
invocation of that function to work properly. Something that
might take me 10-30 minutes is now seconds and itâs not
outside of its reliability spectrum.
Vibe coding though, super deceptive!
written-beyond wrote 3 days ago:
> I like Rust's result-handling system, I don't think it works very
well if you try to bring it to the entire ecosystem that already is
standardized on error throwing.
I disagree, it's very useful even in languages that have exception
throwing conventions. It's good enough for the return type for
Promise.allSettled api.
The problem is when I don't have the result type I end up approximating
it anyway through other ways. For a quick project I'd stick with
exceptions but depending on my codebase I usually use the Go style ok,
err tuple (it's usually clunkier in ts though) or a rust style result
type ok err enum.
turboponyy wrote 10 hours 48 min ago:
I have the same disagreement. TypeScript with its structural and
pseudo-dependent typing, somewhat-functionally disposed language
primitives (e.g. first-class functions as values, currying) and
standard library interfaces (filter, reduce, flatMap et al), and
ecosystem make propagating information using values extremely
ergonomic.
Embracing a functional style in TypeScript is probably the most
productive I've felt in any mainstream programming language. It's a
shame that the language was defiled with try/catch, classes and other
unnecessary cruft so third party libraries are still an annoying
boundary you have to worry about, but oh well.
The language is so well-suited for this that you can even model side
effects as values, do away with try/catch, if/else and mutation a la
Haskell, if you want[1]
(HTM) [1]: https://effect.website/
(DIR) <- back to front page