[HN Gopher] How I program with LLMs
___________________________________________________________________
How I program with LLMs
Author : stpn
Score : 834 points
Date : 2025-01-07 00:07 UTC (1 days ago)
(HTM) web link (crawshaw.io)
(TXT) w3m dump (crawshaw.io)
| mlepath wrote:
| The first rule of programming with LLMs is don't use them for
| anything you don't know how to do. If you can look at the
| solution and immediately know what's wrong with it, they are a
| time saver otherwise...
|
| I find chat for search is really helpful (as the article states)
| qianli_cs wrote:
| Exactly, you have to (vaguely) know what you're looking for and
| have some basic ideas of what algorithms would work. AI is good
| at helping with syntax stuff but not really good at thinking.
| itsgrimetime wrote:
| IMO this is a bad take. I use LLMs for things I don't know how
| to do myself all the time. Now, I wouldn't use one to write
| some new crypto functions because the risk associated with
| getting it wrong is huge, but if I need to write something like
| a wrapper around some cloud provider SDK that I'm unfamiliar
| with, it gets me 90% of the way there. It also is way more
| likely to know at least _some_ of the best practices where I'll
| likely know none. Even for more complex things getting some
| working hello world examples from an LLM gives me way more
| threads to pull on and research than web searching ever has.
| Retr0id wrote:
| > if I need to write something like a wrapper around some
| cloud provider SDK that I'm unfamiliar with
|
| But "writing a wrapper" is (presumably) a process you're
| familiar with, you can tell if it's going off the rails.
| joemazerino wrote:
| Writing a wrapper is easier to verify because of the
| context of the API or SDK you're wrapping. Seems wrong?
| Check the docs. Doesn't work? Curl it yourself.
| Barrin92 wrote:
| >It also is way more likely to know at least _some_ of the
| best practices
|
| What's way more likely to know the best practices is the
| documentation. A few months ago there was a post that made
| the rounds about how the Arc browser introduced a really
| severe security flaw by misconfiguring their Firebase ACLs
| despite the fact that the correct way to configure them is
| outlined in the docs.
|
| This to me is the sort of thing (although maybe not
| necessarily in this case) out of LLM programming. 90% isn't
| good enough, it's the same as Stackoverflow pasting. If
| you're a serious engineer and you are unsure about something,
| it is your task to go to the reference material, or you're at
| some point introducing bugs like this.
|
| In our profession it's not just crypto libraries, one
| misconfigured line in a yaml file can mean causing millions
| of dollars of damage or leaking people's most private
| information. That can't be tackled with a black box chatbot
| that may or may not be accurate.
| zmmmmm wrote:
| > write something like a wrapper around some cloud provider
| SDK that I'm unfamiliar with
|
| you're equating "unfamliar" with "don't know how to do" but I
| will claim you _do_ know how to do it, you would just be slow
| because you have to reference documentation and learn which
| functions do what.
| photon_collider wrote:
| "Trust but verify" is still useful especially when you ask LLMs
| to do stuff you don't know. I've used LLMs to help me get
| started on tasks where I wasn't even sure of what a solution
| was. I would then inspect the code and review any relevant
| documentation to see if the proposed solution would work. This
| has been time consuming but I've learned a lot regardless.
| IanCal wrote:
| That seems like a wild restriction.
|
| You can give them more latitude for things you know how to
| _check_.
|
| I didn't know how to setup the right gnarly typescript generic
| type to solve my problem but I could easily verify it's
| correct.
| fastball wrote:
| If you don't understand what the generic is doing, there
| might be edge-cases you don't appreciate. I think Typescript
| types are fairly non-essential so it doesn't really matter,
| but for more important business logic it definitely can make
| a difference.
| IanCal wrote:
| I understand what it's doing, and could easily set out the
| cases I needed.
| fastball wrote:
| If you understand what it is doing, you could do it
| yourself, surely?
| IanCal wrote:
| Have you never understood the solution to a puzzle much
| more easily than solving it yourself? I feel there's
| literally a huge branch of mathematics dedicated to the
| difference between _finding_ and _validating_ a solution.
|
| More specifically, I didn't know how to solve it, though
| obviously could have spent much more time and learned.
| There were only a small number of possible cases, but I
| needed certain ones to work and others not to. I was
| easily able to create the examples but not find the
| solution. With looping through claude I could solve it in
| a few minutes. I then got an explanation, could read the
| right relevant docs and feel satisfied that not only did
| everything pass the automated checks but my own
| reasoning.
| kccqzy wrote:
| If you merely know how to check, would you also know how to
| _fix_ it after you find that it 's wrong?
|
| If you are lucky to have the LLM fix it for you, great. If
| you don't know how to fix it yourself and the LLM doesn't
| either, you've just wasted a lot of time.
| IanCal wrote:
| It did fix it, I iterated passing in the type and linter
| errors until it passed all the requirements I had.
|
| > If you merely know how to check, would you also know how
| to fix it after you find that it's wrong?
|
| Probably? I'm capable of reading documentation, learning
| and asking others.
|
| > If you don't know how to fix it yourself and the LLM
| doesn't either, you've just wasted a lot of time.
|
| You may be surprised by how little time, but regardless it
| would have taken more time to hit that point without the
| tool.
|
| Also sometimes things don't work out, that's OK. As long as
| overall it improves work, that's all we need.
| kamaal wrote:
| >>f you can look at the solution and immediately know what's
| wrong with it, they are a time saver otherwise...
|
| Indeed getting good at writing code using LLMs demands being
| very good at reading code.
|
| To that extent its more like blitz chess than autocomplete. You
| need to think and verify in trees as it goes.
| billmcneale wrote:
| That's the wrong approach.
|
| I use chat for things I don't know how to do all the time. I
| might not know how to do it, but I sure know how to test that
| what I'm being told is correct. And as long as it's not, I
| iterate with the chat bot.
| WhiteNoiz3 wrote:
| A better way to phrase it might be don't use it for something
| that you aren't able to verify or validate.
| sdesol wrote:
| I agree with this. I keep harping on this, but we are sold
| automation instead of a power tool. If you have domain
| knowledge in the problem that you are solving, then LLMs
| can become an extremely valuable aid.
| bityard wrote:
| I feel like that's a good option ONLY if the code you are
| writing will never be deployed to an environment where
| security is a concern. Many security bugs in code are
| notoriously difficult to spot and even frequently slip
| through reviews from humans who are actively looking for
| exactly those kinds of bugs.
|
| I suppose we could ask the question: Are LLMs better at
| writing secure code than humans? I'll admit I don't know the
| answer to that, but given what we know so far, I seriously
| doubt it.
| zmmmmm wrote:
| I think it's just a broader definition of "know how to do".
| If you can write a test for it then I'm going to argue you
| know "how" to do it in a bigger picture sense. As in, you
| understand the requirements and inherent underlying technical
| challenges behind what you are asking to be done.
|
| The issue is, there are always subtle aspects to problems
| that most developers only know by instinct. Like, "how is it
| doing the unicode conversion here" or "what about the case
| when the buffer is exactly the same size as the message, is
| there room for the terminating character?". You need the
| instincts for these to properly construct tests and review
| the code it did. If you do have those instincts, I argue you
| _could_ write the code, it 's just a lot of effort. But if
| you _don 't_, I will argue you can't test it either and can't
| use LLMs to produce (at least) professional level code.
| j45 wrote:
| You can ask the LLM to teach it to you step by step, and then
| you can validate it by doing it as well as you go, still
| quicker than learning it and not knowing how to debug it.
|
| Learning how something works is critical or it's far worse than
| technical debt.
| lelandfe wrote:
| Yes, I have a friend learning their first programming
| language with much assistance from ChatGPT and it's actually
| going really well.
| j45 wrote:
| Awesome, I wish more people knew about this compared to
| trying to do magic Harry Potter single prompt to do
| everything.
| turnsout wrote:
| I completely agree. In graphics programming, I love having it
| do things that are annoying but easy to verify (like setting up
| frame buffers in WebGL). I also ask it do more ambitious things
| like implementing an algorithm in shader code, and it will
| sometimes give a result that is mostly correct but subtly
| wrong. I only have been able to catch those subtle errors
| because I know what to look for.
| tnvmadhav wrote:
| I'd like to rephrase as, "don't deploy LLM generated code if
| you don't know how it works (or what it does)"
|
| This means, it's okay to use LLM to try something new that
| you're on the fence about. Learn it and then once you've
| learned that concept or the idea, you can go ahead to use same
| code if it's good enough.
| JKCalhoun wrote:
| "don't deploy LLM generated code if you don't know how it
| works (or what it does)"
|
| (Which goes for StackOverflow, etc.)
| switchbak wrote:
| I've seen a whole flurry of reverts due to exactly this.
| I've also dabbled in trusting it a little too much, and had
| the expected pain.
|
| I'm still learning where it's usable and where I'm over-
| reaching. At present I'm at about break-even on time spent,
| which bodes well for the next few years as they iron out
| some of the more obvious issues.
| staticautomatic wrote:
| My experience is the opposite. I find them most valuable for
| helping me do things that would be extremely hard or impossible
| for me to figure out. To wit, I just used one to decode a
| pagination cursor format and write a function that takes a
| datetime and generates a valid cursor. Ain't nobody got time
| for that.
| ignoramous wrote:
| > _... don 't use them for anything you don't know how to do
| ... I find chat for search is really helpful (as the article
| states)_
|
| Not really. I often use Chat to understand codebases. Instead
| trying to navigate mature, large-ish FOSS projects (like say,
| the _Android Run Time_ ) by looking at it file by file, method
| by method, field by field (all to laborious), I just ask ...
| _Copilot_. It is way, way faster than I and are mostly
| _directionally_ correct with its answers.
| logicchains wrote:
| Don't use them for anything you don't know how to test. If you
| can write unit tests you understand and it passes them all (or
| visually inspect/test a GUI it generated), you know it's doing
| well.
| SkyBelow wrote:
| How you use the LLM matters.
|
| Having an LLM do something for you that you don't know how to
| do is asking for trouble. An expert likely can off load a few
| things they aren't all that important, but any junior is going
| to dig themselves into a significant hole with this technique.
|
| But asking an LLM to help you learn how to do something is
| often an option. Can't one just learn it using other resources?
| Of course. LLMs shouldn't be a must have. If at any point you
| have to depend upon the LLM, that is a red flag. It should be a
| possible tool, used when it saves time, but swapped for other
| options when they make sense.
|
| For an example, I had a library I was new to and asked copilot
| how to do some specific task. It gave me the options. I used
| this output to go to google and find the matching documentation
| and gave it a read. I then when back to copilot and wrote up my
| understanding of what the documentation said and checked to see
| if copilot had anything to add.
|
| Could I have just read the entire documentation? That is an
| option, but one that costs more time to give deeper expertise.
| Sometimes that is the option to go with, but in this case
| having a more shallow knowledge to get a proof of concept
| thrown together fit my situation better.
|
| Anyone just copying an AI's output and putting it in a PR
| without understanding what it does? That's asking for trouble
| and it will come back to bite them.
| justatdotin wrote:
| lots of colleauges using copilot or whatever for autocomplete - I
| just find that annoying.
|
| or writing tests - that's ... not so helpful. worst is when a
| lazy dev takes the generated tests and leaves it at that: usually
| just a few placeholders that test the happy path but ignore
| obvious corner cases. (I suppose for API tests that comes down to
| adding test case parameters)
|
| but chatting about a large codebase, I've been amazed at how
| helpful it can be.
|
| what software patterns can you see in this repo? how does the
| implementation compare to others in the organisation? what common
| features of the pattern are missing?
|
| also, like a linter on steroids, chat can help explore how my
| project might be refactored to better match the organisation's
| coding style.
| roskilli wrote:
| If you don't mind me asking: which popular LLM(s) have you been
| using for this and how are you providing the code base into the
| context window?
| fragmede wrote:
| Not OP but Aider provides a repo map to the LLM as context,
| which consists of the directory tree, filenames, and
| important symbols in each file. It can use the popular LLMs
| as well as Ollama.
|
| https://aider.chat/docs/repomap.html
|
| Aider hosts a leaderboard that rates LLMs on performance,
| including a section on refactoring.
|
| https://aider.chat/docs/leaderboards/refactor.html
| Zambyte wrote:
| AI generated images _can_ be good, and even reasonable to
| use for branding. Slapping an image right at the top of the
| page that says "Abstract Synxex Tree" with a meaningless
| graph and an absolutely expressionless and useless humanoid
| robot is a great way to immediately lose my interest in
| anything they have to say though. The homepage would be
| more interesting as a wall of text.
| klibertp wrote:
| Agreed, mostly, but this is not a homepage. On the
| homepage, there's a video demo and a wall of text
| (https://aider.chat/). Still, that Synxex Tree should
| disappear :)
| wrs wrote:
| I've been working with Cursor's agent mode a lot this week and am
| seeing where we need a new kind of tool. Because it sees the
| whole codebase, the agent will quickly get into a state where
| it's changed several files to implement some layering or refactor
| something. This requires a response from the developer that's
| sort of like a code review, in that you need to see changes and
| make comments across multiple files, but unlike a code review,
| it's not finished code. It probably doesn't compile, big chunks
| of it are not quite what you want, it's not structured into
| coherent changesets...it's kind of like you gave the intern the
| problem and they submitted a bit of a mess. It would be a
| terrible PR, but it's a useful intermediate state to take another
| step from.
|
| It feels like the IDE needs a new mode to deal with this state,
| and that SCM needs to be involved somehow too. Somehow help the
| developer guide this somewhat flaky stream of edits and sculpt it
| into a good changeset.
| fragmede wrote:
| Aider commits to git with each command, making it easy to back
| out changes, and also squash them into discrete chunks later
| (and reorder them with interactive rebase).
| golergka wrote:
| Automatically runs linter and tests on every edit and
| forwards failures back to LLM as well.
| Aeolun wrote:
| I think the full agent mode context is actually often hard to
| see, but there's a list somewhere. The list of files in your
| chat dialog is not the full context (it adds open files too). I
| find that if I reduce the context size Cursor gives me much
| better results.
| User23 wrote:
| LLMs are, at their core, search tools. Training is indexing and
| prompting is querying that index. The granularity being at the
| n-gram rather than the document level is a huge deal though.
|
| Properly using them requires understanding that. And just like we
| understand every query won't find what we want, neither will
| every prompt. Iterative refinement is virtually required for
| nontrivial cases. Automating that process, like eg cursor agent,
| is very promising.
| IanCal wrote:
| Half of the problems are people treating them as searchers when
| they aren't. They're absolutely not ngram indexes of existing
| data, either.
| mvdtnz wrote:
| I'm losing track of the number of different things the Hacker
| News commenters claim LLMs are "at their core".
| bitwize wrote:
| LLMs are, at their core, _fucking Dissociated Press_. That 's
| what makes them fun and interesting, and that's the problem
| with using them for real production work.
| sulam wrote:
| Isn't this answer obvious/facile but also true? They're next
| token predictors.
| sdesol wrote:
| > LLMs are, at their core, search tools.
|
| This is the wrong take. Search tools are deterministic unless
| you purposely inject random weights into the ranking. With
| search tools, the same search query will always yield the same
| search result, provided they are designed too and/or the
| underlying data has not changed.
|
| With LLMs, I can ask the exact same question and get a
| different response, even if the data has not changed.
| Scene_Cast2 wrote:
| The randomness comes from sampling. With local LLMs, you can
| fix the random seed, or even disable sampling all together -
| both will get you determinism.
|
| I agree that LLMs are not search tools, but for very
| different reasons.
| klabb3 wrote:
| Semantics. It may be able to get deterministic but it's
| _unstable_ wrt unrelated changes in the training data, no?
| If I add a page about sausages to a search index, the
| results for "ski jacket" will be unaffected. In a practical
| sense, LLMs are non-deterministic. I mean, ChatGPT even has
| a "regenerate" button to expose this "turbulence" as a
| feature.
| User23 wrote:
| Hence n-grams rather than documents.
|
| Also what's with using "semantics" as a dismissal when
| the technology we're talking about is the most
| semantically relevant search ever made.
| sdesol wrote:
| Thanks for the info on local LLMs. Based on my chats with
| multiple LLMs, the biggest issue appears to be hardware.
|
| Non-deterministic hardware: All LLMs mentioned that modern
| computing hardware, such as GPUs or TPUs, can introduce
| non-determinism due to factors like parallel processing,
| caching, or numerical instability. This can make it
| challenging to achieve determinism, even with fixed random
| seeds or deterministic algorithms.
|
| You can find the summary of my chats https://beta.gitsense.
| com/?chat=1c3e69f9-7b8b-48a3-8b99-bb1b.... If you scroll to
| the top and click on the "Conversation" link in the first
| message, you can read the individual responses.
| jcranmer wrote:
| > LLMs are, at their core, search tools.
|
| Fundamentally, _no they 're not_. That is why you have cases
| like the Air Canada chatbot that told a user about a refund
| opportunity that didn't exist, or the lawyer in Mata v Avianca
| who cited a case that didn't exist. If you ask an LLM to search
| for something that doesn't exist, there's a decent chance it
| will hallucinate something into existence for you.
|
| What LLMs are good at is effectively turning fuzzy search terms
| into non-fuzzy terms; they're also pretty good at taking some
| text and recasting into an extremely formulaic paradigm. In
| other words, turning unstructured text into something
| structured. The problem they have is that they don't have
| enough understanding of the world to do something useful that
| with structured representation that needs to be accurate.
| notjoemama wrote:
| Our company has a no AI use policy. The assumption is zero trust.
| We simply can't know whether a model or its framework could or
| would send proprietary code outside the network. So it's best to
| assume all LLMs/AI is or will send code or fragments of code.
| While I applaud the incredible work by their creators, I'm not
| sure how a responsible enterprise class company could rely on
| "trust us bro" EULAs or repo readmes.
| codebje wrote:
| The same way responsible enterprise class companies rely on
| "trust us bro" EULAs for financial systems, customer databases,
| payroll, and all the other systems it would be very expensive
| and error prone to build custom for every business.
| ryanobjc wrote:
| Pretty much this.
|
| OpenAI poisoned the well badly with their "we train off your
| chats" nonsense.
|
| If you are using any API service, or any enterprise ChatGPT
| plan, your tokens are not being logged and recycled into new
| training data.
|
| As for why trust them? Like the parent said: EULAs. Large
| companies trust EULAs and terms of service for every single
| SAAS product they use, and they use tons and tons of them.
|
| OpenAI in a clumsy attempt to create a regulatory moat by
| doing sketchy shit and waving wild "AI will kill us all"
| nonsense has created a situation where the usefullness of
| these transforming generative solutions are automatically
| rejected by many.
| pama wrote:
| Your company could locally host LLMs; you wont get chatGPT or
| Claude quality, but you can get something that would have been
| SOTA a year ago. You can vet the public inference codebases
| (they are only of moderate complexity), and you control your
| own firewalls.
| CubsFan1060 wrote:
| You can run Claude on both AWS and Google Cloud. I'm fairly
| certain they don't share data, but would need to verify to be
| sure.
| evilduck wrote:
| You can also run Llama 405B and the latest (huge) DeepSeek
| on your own hardware and get LLMs that trade blows with
| Claude and ChatGPT, while being fully isolated and offline
| if needed.
| krembo wrote:
| With Amazon Bedrock you can get an isolated serverless
| Claude or llama with a few clicks
| evilduck wrote:
| True, but if your org is super paranoid about data
| exfiltration you're probably not sending it to AWS
| either.
| Kostchei wrote:
| You can get standalone/isolated versions of chatGPT, if your
| org is large enough, in partnership with OpenAI. And others.
| They run on the same infra but in accounts you set up, cost
| the same, but you have visibility on the compute, and control
| of data exfil - ie is there is none.
| j45 wrote:
| Local LLMs for code aren't that out of the question to run.
|
| Even for not code generation, but even smaller models only for
| programming to weigh on different design approaches, etc.
| attentive wrote:
| So, you're asking how enterprise class companies are using
| github for repos and gmail for all the enterprise mail? What's
| next, zoom/teams for meetings?
| lazybreather wrote:
| Palo Alto networks provides security product "AI access
| security" which claims to solve the problem you mentioned -
| access control, data protection etc. I don't personally use it
| neither does my org. Giving here just in case it is useful for
| someone.
| BBosco wrote:
| The vast majority of fortune 500's have legal frameworks up for
| dealing with internal AI use already because the reality is
| employees are going to use it regardless of internal policy.
| Assuming every employee will act in good faith just because a
| blanket AI ban is in place is extremely optimistic at best, and
| isn't a good substitute for actual understanding.
| sulam wrote:
| Internal policies at these companies are rarely subject to a
| level of faith that you're implying. Instead external access
| to systems is logged, internal systems are often sandboxed or
| otherwise constrained in how you interact with them, and
| anything that looks like exfiltration sets off enough alarms
| to have your manager talking to you that same day, if not
| that same hour.
| Pyxl101 wrote:
| Just curious, how does your company host its email? Documents?
| Files?
| janalsncm wrote:
| You can run pretty decent models on your laptop these days.
| Works in airplane mode.
|
| https://ollama.com/
| golergka wrote:
| What's the realistic attack scenario? Will Sam Altman steal
| your company's code? Or will next version of GPT learn on your
| secret sauce algorithms and then your competitors will get them
| when they generate code for their tasks and your company loses
| its competitive advantage?
|
| I'm actually sure that there are companies for which these
| scenarios are very real. But I don't think there's a lot of
| them. Most of the code our industry works on has very little
| value outside of context of particular product and company.
| cudgy wrote:
| So why bother securing anything at all if not willing to
| secure the raisons d'etre? Doesn't that suggest that these
| companies are trivial entities?
| golergka wrote:
| There are plenty of very realistic attack scenarios, that's
| why we secure stuff.
| Aeolun wrote:
| I mean, we host our code on Github. What are they going to do
| with Copilot code snippets?
| mbesto wrote:
| > proprietary code outside the network
|
| Thought exercise: what would seriously happen if you did let
| some of your proprietary code outside your network? Oddly
| enough, 75% of the people writing code on HN probably have
| their companies code stored in GitHub. So there already is an
| inherent trust factor with GH/MSFT.
|
| As another anecdote - Twitch's source code got leaked a few
| years back. Did Twitch lose business because of it?
| aulin wrote:
| > Thought exercise: what would seriously happen if you did
| let some of your proprietary code outside your network
|
| Lawsuits? Lawful terminations? Financial damages?
| mbesto wrote:
| Huh? No, i'm saying, what potential damage does an
| organization have? Not the individual who may leak data
| outside your network.
| aulin wrote:
| Those are risks both for the individual and for the
| company when there are contracts in place with third
| parties involving code sharing.
|
| Other risks include leaking industrial secrets that may
| significantly damage company business or benefit
| competitors.
| klibertp wrote:
| Please acknowledge that your situation is pretty unique.
| Just take a look at the comments: how many people say, or
| outright presume, that _their company 's_ code is already
| on GitHub? I'd wager that your org _doesn 't_ keep code
| at a 3rd party provider, right? Then, you're in a
| minority.
|
| I don't mean to dismiss your concerns - in your
| situation, they are probably warranted - I just wanted to
| say that they are unique and not necessarily shared by
| people who don't share your circumstances.
| aulin wrote:
| This subthread started with someone from a no AI policy
| company, people are dismissing it with snarky comments,
| along the line of your code is not as important as you
| believe. I'm just trying to show a different picture, we
| work in a pretty vast field and people commenting here
| don't necessarily represent a valid sample.
| klibertp wrote:
| > people are dismissing it with snarky comments, along
| the line of your code is not as important as you believe.
|
| That says more about those people than about your/OP's
| code :)
|
| Personally, I had a few collisions with regulation and
| compliance over the years, so I can appreciate the
| completely different mindset you need when working with
| them. On the other hand, at my current position, not only
| do we have everything on Github, but there were also
| instances where I was tasked with mirroring everything to
| bitbucket! (For code escrow... i.e., if we go out of
| business, our customer will get access to the mirrored
| code.)
|
| > people commenting here don't necessarily represent a
| valid sample.
|
| Right. I should have said that you're in the minority
| _here_. I 'm not sure what's the ratio of dumb CRUD apps
| to "serious business" kind of development in the wild. I
| know there are whole programming subfields where your
| kinds of concerns are typical. They might just be
| underrepresented here.
| aulin wrote:
| Yes I've had plenty of experiences with orgs that self
| host everything, I don't think it's a minority it's just
| a different cluster than the one most represented here.
|
| Still I believe hosting is somewhat different, if
| anything because it's something established, known
| players, trusted practices. AI is new, contracts are
| still getting refined, players are still making their
| name, companies are moving fast and I doubt data
| protection is their priority.
|
| I may be wrong but I think it's reasonable for IT
| departments to be at least prudent towards these
| frameworks. Search is ok, chat is okish, crawling whole
| projects for autocompletion I'd be more careful.
| mbesto wrote:
| > Yes I've had plenty of experiences with orgs that self
| host everything, I don't think it's a minority it's just
| a different cluster than the one most represented here.
|
| I've done 800+ tech diligence projects and have first
| hand knowledge of every single one's use of VCS. At least
| 95% of the codebases are stored on a cloud hosted VCS.
| It's absolutely a minority to host your own VCS.
| mbesto wrote:
| > I doubt data protection is their priority.
|
| So you're basing your whole argument on nothing other
| than "I just don't feel like they do that".
|
| Does this look unserious to you?
| https://trust.openai.com/
| mbesto wrote:
| First, I didn't dismiss their "no AI policy" nor did I
| use snarky comments. I was asking a legitimate question -
| which is - most orgs have their code stored on another
| server out of their control, so what's the legitimate
| business issue if your code gets leaked? I still haven't
| gotten an answer.
| switchbak wrote:
| The other consideration: your company's code probably just
| isn't that good.
|
| I think many people over-value this giant pile of text.
| That's not to say IP theft doesn't exist, but I think the
| actual risk is often overblown. Most of an organization's
| value is in the team's collective knowledge and teamwork
| ability, not in the source code.
| lm28469 wrote:
| > I'm not sure how a responsible enterprise class company could
| rely on "trust us bro" EULAs or repo readmes.
|
| Isn't that what we do with operating systems, internet
| providers, &c. ?
| aulin wrote:
| How is that related? we're talking of continuously sending
| proprietary code and related IP to a third party, seems a
| pretty valid concern to me.
|
| I, for one, work every day with plenty of proprietary vendor
| code under very restrictive NDAs. I don't think they would be
| very happy knowing I let AIs crawl our whole code base and
| send it to remote language models just to have fancy
| autocompletion.
| lm28469 wrote:
| Do you read every single line of code of every single
| dependency you have ? I don't see how llms are more of a
| threat than a random compromised npm package or something
| from a OS package manager. Chances are you're already
| relying on tons and tons of "trust me bro" and "it's
| opensource bro don't worry, just read the code if you feel
| like it"
| aulin wrote:
| One thing is consciously sharing IP with third parties
| violating contracts, another is falling victim of
| malicious code in the toolchain.
|
| Npm concern though suggests we likely work in very
| different industries so that may explain the different
| perspective.
| bongodongobob wrote:
| Ok, the LLM crawls your code. Then what? What is the
| exfiltration scenario?
| ryanobjc wrote:
| "Continuously sending proprietary code and related IP to a
| third party"
|
| Isn't this... github?
|
| Companies and people are doing this all day every day. LLM
| APIs are really no different. Only when you magic it up as
| "the AI is doing thinking" ... but in reality text ->
| tokens -> math -> tokens -> text. It's a transformation of
| numbers into other numbers.
|
| The EULAs and ToS say they don't log or retain information
| from API requests. This is really no different than Google
| Drive, Atlassian Cloud, Github, and any number of online
| services that people store valuable IP and proprietary
| business and code in.
| tsukikage wrote:
| You can get models that run offline. The other risk is
| copyright/licensing exposure; e.g. the AI regurgitates a
| recognisably large chunk of GPL code, and suddenly you have a
| legal landmine in your project waiting to be discovered.
| There's no sane way for a reviewer to spot this situation in
| general.
|
| You can ask a human to not do that, and there are various risks
| to them personally if they do so regardless. I'd like to see
| the AI providers take on some similar risks instead of
| disclaiming them in their EULAs before I trust them the way I
| might a human.
| cudgy wrote:
| Does your company develop software overseas where legal action
| is difficult? Or where their ip could be nationalized or
| secretly stolen? Where network communications are monitored and
| saved?
| k__ wrote:
| Seems like only working on open source code has its benefits.
| bangaladore wrote:
| The killer feature about LLMs with programming in my opinion is
| autocomplete (the simple copilot feature). I can probably be 2-3x
| more productive as I'm not typing (or thinking much). It does a
| fairly good job pulling in nearby context to help it. And that's
| even without a language server.
|
| Using it to generate blocks of code in a chat like manner in my
| opinion just never works well enough in the domains I use it on.
| I'll try to get it to generate something and then realize when I
| get some functional result I could've done it faster and more
| effectively.
|
| Funny enough, other commenters here hate autocomplete but love
| chat.
| m3kw9 wrote:
| The autocomplete is mostly a nusance and maybe low percentage
| of the time it does right.
| tptacek wrote:
| Yeah, I don't like it either. I think it speaks to the
| mindset difference Crawshaw is talking about here. When I'm
| writing code, I don't want things getting in my way. I have a
| plan. I'm actually pretty Zen about all the typing. It's part
| of my flow-state. But when I'm exploring code in a dialog
| with a chatbot, I'm happy for the help.
| switchbak wrote:
| I think we're going to be considered dinosaurs pretty soon.
| Much like how it's getting harder to buy a manual
| transmission, programming 'the old way' will probably just
| fade away over time.
| LVB wrote:
| The biggest nuisance aspect for me is when it is trying to do
| things that the LSP can do 100% correctly. Almost surely it
| is my tooling setup and the LLM is squashing LSP stuff.
| Seeing Copilot (or even Cursor) suggesting methods or
| parameters that don't exist is really annoying. Just stand
| down and let the LSP answer those basic questions, TYVM.
| throwup238 wrote:
| Cursor ostensibly has a config setting to run a "shadow"
| workspace [1], aka a headless copy of the window you're
| working in to get feedback from linters and LSPs but
| they've been iterating so fast I'm not sure it's still
| working (or ever did much, really).
|
| It really feels like we're at the ARPANET stage where
| there's so much obvious hanging fruit, it's just going to
| take companies a while to perfect it.
|
| [1] https://www.cursor.com/blog/shadow-workspace
| ahoka wrote:
| The industry standard was 40% accepted the last time I
| checked. Correct could be a bit lower, so maybe 1/3?
|
| It's like having to delete the auto-closed parenthesis more
| often than not.
| jghn wrote:
| I thought so too. Until I worked with a client who doesn't
| allow the use of LLM tools, and I had to turn my Copilot off.
| That's when I realized how much I'd grown to rely on it
| despite the headaches.
| LeftHandPath wrote:
| I've never used it, simply because I hate autocomplete in
| emails.
|
| Gmail autocomplete saves me _maybe_ 2-5s per email: the
| recipients name, a comma, and a sign off. Maybe a quarter or
| half sentence here or there, but never exactly what I would've
| typed.
|
| In code bases, I've never seen the appeal. It's only reliably
| good at stuff that I can easily find on Google. The savings are
| inconsequential at best, and negative at worst when it
| introduces hard-to-pinpoint bugs.
|
| LLMS are incredible technology, but when applied to code, they
| act more like non-deterministic macros.
| switchbak wrote:
| "negative at worst when it introduces hard-to-pinpoint bugs"
| - this is actually very true. I've had it recreate patterns
| _partially_, and paste in the wrong thing in a place that was
| very hard to discern.
|
| It probably saved me 40 mins, then proceeded to waste 2 hours
| of me hunting for that issue. I'm probably at the break-even
| on the whole. The ultimate promise is very compelling, but my
| current use isn't particularly amazing. I do use a niche
| language though, so I'm outside the global optima.
| LeftHandPath wrote:
| Exactly! I expect that some are able to put it to good use.
| I am not one of those people.
|
| My experiences with ChatGPT and Gemini have included lots
| of confident but wrong answers, eg "What castle was built
| at the highest altitude". Thats what gives me pause.
|
| Gemini spits out a great 2D A* implementation no problem.
| That is _awesome_. Actually, contrary to my original
| comment, I probably will use AI for that sort of thing
| going forward.
|
| Despite that, I don't want it in my IDE. Maybe I'm just a
| bit of a Luddite.
| imhoguy wrote:
| Both autocomplete and chat are half-way UX solutions. Really
| what I need is some kind of mix of in-place chat with
| completion.
|
| For context, very often I have to put some comment before the
| line for completion to set an expectation context.
|
| Instead editor should allow me to influence completion with
| some kind of in-place suggestion input available under keyboard
| shortcut. Then I could type what I want into such input and
| when I hit Enter or Tab the completion proposal appears. Even
| better if it would let me undo/modify such input, and have
| shortcuts like "show me different option", "go back to
| previous".
| switchbak wrote:
| I had to turn autocomplete off. I value it when I want it, but
| otherwise it's such a distraction that it both slows me down,
| and actively irritates me.
|
| Perhaps I'm just an old man telling the LLM to get off my lawn,
| but I find it does bad things to my ability to concentrate on
| hard things.
|
| Having a good sense of when it would be useful, and invoking it
| on demand seems to be a decent enough middle ground for me.
| Much of it boils down to UX - if it could be present but not
| actively distracting, I'd probably be ok with it.
| jimmydoe wrote:
| Anyone has good recommendation of LocalLLM for autocompletion
|
| Most editors I use supports online LLM but it's too slow
| sometimes for me.
| ec109685 wrote:
| Unless your network is poor, I'd imagine (but definitely could
| be wrong in your case!), the bottleneck is the LLM speed, not
| the latency to the data center its running in.
| th4t1sW13rd wrote:
| https://www.continue.dev/
| jimmydoe wrote:
| Thank you!
| wdutch wrote:
| I no longer work in tech, but I still write simple applications
| to make my work life easier.
|
| I frequently use what OP refers to as chat-driven programming,
| and I find it incredibly useful. My process starts by explaining
| a minimum viable product to the chat, which then generates the
| code for me. Sometimes, the code requires a bit of manual
| tweaking, but it's usually a solid starting point. From there, I
| describe each new feature I want to add--often pasting in
| specific functions for the chat to modify or expand.
|
| This approach significantly boosts what I can get done in one
| coding session. I can take an idea and turn it into something
| functional on the same day. It allows me to quickly test all my
| ideas, and if one doesn't help as expected, I haven't wasted much
| time or effort.
|
| The biggest downside, however, is the rapid accumulation of
| technical debt. The code can get messy quickly. There's often a
| lot of redundancy and after a few iterations it can be quite
| daunting to modify.
| j45 wrote:
| Is there a model you prefer to use?
| KTibow wrote:
| Not wdutch but Claude Sonnet is one of the best models out
| there for programming, o1 is sometimes better but costs more
| chii wrote:
| > The code can get messy quickly. There's often a lot of
| redundancy and after a few iterations it can be quite daunting
| to modify.
|
| i forsee in the future an LLM that has sufficient context
| length for (automatic) refactoring and tech debt removal, by
| pasting large portions of these existing code in.
| scarface_74 wrote:
| Even without LLMs, at least with statically type languages
| like C#, ReSharper can do solution wide refactoring that are
| guaranteed correct as long as you don't use reflection.
|
| https://www.jetbrains.com/help/resharper/Refactorings__Index.
| ..
|
| I don't see any reason it couldn't do more aggressive
| refactors with LLMs and either correct itself or don't do the
| refactor if it fails static code checking. Visual Studio can
| already do real time type checking for compile time errors
| Aeolun wrote:
| Cursor has recently added something like this 'Bug Finder'.
| It told me that finding bugs on my entire codebase would cost
| me $21 or so, so I never actually tried, but it sounds cool.
| prettyblocks wrote:
| I have a similar approach, but the mess can be contained by
| asking for optimizations and refactors very frequently and only
| asking for very granular features.
| trash_cat wrote:
| > The biggest downside, however, is the rapid accumulation of
| technical debt. The code can get messy quickly. There's often a
| lot of redundancy and after a few iterations it can be quite
| daunting to modify.
|
| What stops you from using o1 or sonnet to refactor everything?
| It sounds like a typical LLM task.
| SkyBelow wrote:
| >The biggest downside, however, is the rapid accumulation of
| technical debt.
|
| Is that really related to the LLM?
|
| Even in pre-LLM times, anytime I've scrapped together some code
| to solve some small immediate problem it grows tech debt at an
| amazing rate. Getting a feel for when a piece of code is going
| to be around long enough that it needs to be refactored,
| cleaned up, documented, etc. is a skill I developed over time.
| Even now it isn't a prefect guess, as there is an ongoing tug
| of war between wasting time today refactoring something I might
| not touch again with wasting time tomorrow having to pick up
| something I didn't clean up.
| nemothekid wrote:
| I think "Chat driven programming" is the most common type of the
| most hyped LLM-based programming I see on twitter that I just
| can't relate to. I've incorporated LLMs mainly as auto-complete
| and search; asking ChatGPT to write a quick script or to scaffold
| some code for which the documentation is too esoteric to parse.
|
| But having the LLM do things for me, I frequently run into issues
| where it feels like I'm wasting my time with an intern. " _Chat-
| based LLMs do best with exam-style questions_ " really speaks to
| me, however I find that constructing my prompts in such a way
| where the LLM does what I want uses just as much brainpower as
| just programming the thing my self.
|
| I do find ChatGPT (o1 especially) really good at optimizing
| existing code.
| throwup238 wrote:
| _> "Chat-based LLMs do best with exam-style questions" really
| speaks to me, however I find that constructing my prompts in
| such a way where the LLM does what I want uses just as much
| brainpower as just programming the thing my self._
|
| It speaks to me too because my mechanical writing style (as
| opposed to creative prose) could best be described as what I
| learned in high school AP English/Literature and the rest of
| the California education system. For whatever reason that
| writing style dominated the training data and LLMs just happens
| to be easy to use because I came out of the same education
| system as many of the people working at OpenAI/Anthropic.
|
| I've had to stop using several generic turns of phrase like "in
| conclusion" because it made my writing look too much like
| ChatGPT.
| AlotOfReading wrote:
| It's interesting that you find it useful for optimization. I've
| found that they're barely capable of anything more than shallow
| optimization in my stuff without significant direction.
|
| What I find useful is that I can keep thinking at one
| abstraction level without hopping back and forth between
| algorithm and codegen. The chat is also a written artifact I
| can use the faster language parts of my brain on instead of the
| slower abstract thought parts.
| tptacek wrote:
| There's an art to cost-effectively coaxing useful answers
| (useful drafts of code) from an LLM, and there's an art to
| noticing the most productive questions to put to that process.
| It's a totally different way of programming than having an LLM
| looking over your shoulder while you direct, function by
| function, type by type, the code you're designing.
|
| If you feel like you're wasting your time, my bet is that
| you're either picking problems where there isn't enough value
| to negotiate with the LLM, or your expectations are too high.
| Crawshaw mentions this in his post: a lot of the value of this
| chat-driven style is that it very quickly gets you unstuck on a
| problem. Once you get to that point, you take over! You don't
| convince the LLM to build the final version you actually commit
| to your branch.
|
| Generating unit test cases --- in particular, generating unit
| test cases that reconcile against unsophisticated, brute-force,
| easily-validated reference implementations of algorithms ---
| are a perfect example of where that cost/benefit can come out
| nicely.
| sibeliuss wrote:
| My technique is to feed it a series of intro questions that
| prepare it for the final task. Chat the thing into a proper
| comfort level, and then from there, with the context at hand,
| ask to help solve the real problem. Def feels like a new kind
| of programming model because its still very programming-esque.
| Aeolun wrote:
| I've found that everything just works (more or less) since
| switching to Cursor. Agent based composer mode is magical. Just
| give it a few files for context, and ask it to do what you
| want.
| _boffin_ wrote:
| Does anyone know of any good chat based ui builders. No. Not
| build a chat app.
|
| Does webflow have something?
|
| My problem is being able to describe what I want in the style I
| want.
| replwoacause wrote:
| https://lovable.dev
|
| https://bolt.new
|
| https://v0.dev
|
| Never used them myself but have seen them mentioned on Reddit
| and Twitter.
| singpolyma3 wrote:
| It seems like everything I see about success using LLMs for this
| kind of work is for greenfield. What about three weeks later when
| the job changes to maintenance and interation on something that's
| already working? Are people applying LLMs to that space?
| kylebenzle wrote:
| Yes, it's just harder the larger the pre-existing code base.
| throwup238 wrote:
| My codebase is relatively greenfield (started working on it
| early last year) but it's up to ~50k lines in a mixed C++/Rust
| codebase with a binding layer whose API predates every LLM's
| training sets. Even when I started ChatGPT/Claude weren't very
| useful but now the project requires a completely different
| strategy when working with LLMs (it's a QT AI desktop app so
| I'm dogfooding a lot). I've also used them in a larger codebase
| (~500k lines) and that also requires a different approach from
| the former. It feels a lot like the transition from managing 2
| to 20 to 200 to 2000 people. It's a different ballgame with
| each step change. A very well encapsulated code base of ~500k
| lines is manageable for small changes but not for refactoring,
| exploration, etc, at least until useful context sizes increase
| another order of magnitude (I keep trying Gemini's 2M but it's
| been a disappointment).
|
| I have a _lot_ of documentation aimed at the AI in `docs
| /notes/` (some of it written by an LLM but proofread before
| committing) and I instruct Cursor/Windsurf/Aider via their
| respective rules/config files to look at the documentation
| before doing anything. At some scale that initial context
| becomes just a directory listing & short description of
| everything in the notes folder, which eventually breaks down
| due to context size limits, either because I exceed the maximum
| length of the rules or the agent requires pulling in too much
| context for the change.
|
| I've found that there's actually an uncanny valley between
| greenfield projects where the model is free to make whatever
| assumptions it wants and brownfield projects where it's
| possible to provide enough context from the existing codebase
| to get both API accuracy (hallucinations) and general patterns
| through few-shot examples. This became very obvious once I had
| enough examples of that binding layer. Even though I could
| include all of the documentation for the library, it didn't
| work consistently until I had a variety of production examples
| to point it to.
|
| Right now, I probably spend as much time writing each prompt as
| I do massaging the notes folder and rules every time I notice
| the model doing something wrong.
| zkry wrote:
| Logically this makes sense: every model has a context size and
| complexity capacity where it will no longer be able to function
| properly. Any usage of said model will accelerate the approach
| to this limit. Once the limit is reached, the LLM is no longer
| as helpful as it was.
|
| I work on full blown legacy apps and needless to say I don't
| even bother with LLMs when working on these most of the time.
| Mashimo wrote:
| I used AI code completion from GitHub copilot on a 20 year old
| project. You still have to create new classes, new test,
| refactor etc.
| valenterry wrote:
| Yeah, it sucks. LLMs are not great with a big context yet. I
| hope that is being worked on. I need the LLM to read my whole
| project AND optimally all related slack conversations, the wiki
| and related libraries.
| glouwbug wrote:
| Then what will you do?
| valenterry wrote:
| I can for example tell it to refactor things. It would have
| to write files of course. E.g. "Add retries with
| exponential backoffs to all calls to service X"
| e12e wrote:
| Interesting. I wonder what the equivalent of sketch.dev would
| look like if it targeted Smalltalk and was embedded in a
| Smalltalk image (preferably with a local LLM running in
| smalltalk)?
|
| I'd love to be able to tell my (hypothetical smalltalk) tablet to
| create an app for me, and work interactively, interacting with
| the app as it gets built...
|
| Ed: I suppose I should just try and see where cloud ai can take
| smalltalk today:
|
| https://github.com/rsbohn/Cuis-Smalltalk-Dexter-LLM
| klibertp wrote:
| Worth a look: https://github.com/feenkcom/gt4llm If you load
| this in GT, you'll get a Lepiter book with interactive
| tutorials.
| dewitt wrote:
| One interesting bit of context is that the author of this post is
| a legit world-class software engineer already (though probably
| too modest to admit it). Former staff engineer at Google and co-
| founder / CTO of Tailscale. He doesn't _need_ LLMs. That he says
| LLMs make him more productive at all as a hands-on developer,
| especially around first drafts on a new idea, means a lot to me
| personally.
|
| His post reminds me of an old idea I had of a language where all
| you wrote was function signatures and high-level control flow,
| and maybe some conformance tests around them. The language was
| designed around filling in the implementations for you. 20 years
| ago that would have been from a live online database, with
| implementations vying for popularity on the basis of speed or
| correctness. Nowadays LLMs would generate most of it on the fly,
| presumably.
|
| Most ideas are unoriginal, so I wouldn't be surprised if this has
| been tried already.
| knighthack wrote:
| I knew he was a world-class engineer the moment I saw that his
| site didn't bother with CSS stylesheets, ads, pictures, or
| anything beyond a rudimentary layout.
|
| The whole article page reads like a site from the '90s, written
| from scratch in HTML.
|
| That's when I _knew_ the article would go hard.
|
| Substantive pieces don't need fluffy UIs - the idea takes the
| stage, not the window dressing.
| shaneofalltrad wrote:
| I wonder what he uses, I noticed the first paragraph took
| over a second to load... Largest Contentful Paint element
| 1,370 ms This is the largest contentful element painted
| within the viewport. Element p
| cess11 wrote:
| Looks like it loads all the Google surveillance without
| asking. Should IP-block the EU.
| alexvitkov wrote:
| Glad to know I was a world class engineer at the age of 8,
| when all I knew were the <h1> and <b> tags!
| dekhn wrote:
| I think what you're describing is basically "interface driven
| development" and "test driven development" taken to the
| extreme: where the formal specification of an implementation is
| defined by the test suite. I suppose a cynic would say that's
| what you get if you left an AI alone in a room with Hyrum's
| Law.
| gopalv wrote:
| > That he says LLMs make him more productive at all as a hands-
| on developer, especially around first drafts on a new idea,
| means a lot to me personally.
|
| There is likely to be a great rift in how very talented people
| look at sharper tools.
|
| I've seen the same division pop up with CNC machines, 3d
| printers, IDEs and now LLMs.
|
| If you are good at doing something, you might find the new
| tool's output to be sub-par over what you can achieve yourself,
| but often the lower quality output comes much faster than you
| can generate.
|
| That causes the people who are deliberate & precise about their
| process to hate the new tool completely - expressing in the
| actual code (or paint, or marks on wood) is much better than
| trying to explain it in a less precise language in the middle
| of it. The only exception I've seen is that engineering folks
| often use a blueprint & refine it on paper.
|
| There's a double translation overhead which is wasteful if you
| don't need it.
|
| If you have dealt with a new hire while being the senior of the
| pair, there's that familiar feeling of wanting to grab their
| keyboard instead of explaining how to build that regex - being
| able to do more things than you can explain or just having a
| higher bandwidth pipe into the actual task is a common sign of
| mastery.
|
| The incrementalists on the other hand, tend to love the new
| tool as they tend to build 6 different things before picking
| what works the best, slowly iterating towards what they had in
| mind in the first place.
|
| I got into this profession simply because I could Ctrl-Z to the
| previous step much more easily than my then favourite chemical
| engineering goals. In Chemistry, if you get a step wrong, you
| go to the start & start over. Plus even when things work, yield
| is just a pain there (prove it first, then you scale up
| ingredients etc).
|
| Just from the name of sketch.dev, it appears that this author
| is of the 'sketch first & refine' model where the new tool just
| speeds up that loop of infinite refinement.
| liotier wrote:
| > If you are good at doing something, you might find the new
| tool's output to be sub-par over what you can achieve
| yourself, but often the lower quality output comes much
| faster than you can generate. That causes the people who are
| deliberate & precise about their process to hate the new tool
| completely
|
| Wow, I've been there ! Years ago we dragged a GIS system
| kicking and screaming from its nascent era of a dozen
| ultrasharp dudes with the whole national fiber optics network
| in their head full of clever optimizations, to three thousand
| mostly clueless users churning out industrial scale
| spaghetti... The old hands wanted a dumb fast tool that does
| their bidding - they hated the slower wizard-assisted
| handholding, that turned out to be essential to the new
| population's productivity.
|
| Command line vs. GUI again... Expressivity vs.
| discoverability, all the choices vs. don't make me think.
| Know your users !
| namaria wrote:
| This whole thing makes me think of that short story "The
| Machine Stops".
|
| As we keep burrowing deeper and deeper into an overly
| complex system that allows people to get into parts of it
| without understanding the whole, we are edging closer to a
| situation where no one is left who can actually reason
| about the system and it starts to deteriorate beyond repair
| until it suddenly collapses.
| jprete wrote:
| This is a good characterization. I'm precision-driven and
| know what I need to do at any low level. It's the high-level
| definition that is uncertain. So it doesn't really help to
| produce a dozen prototypes of an idea and pick one, nor does
| it help to fill in function definitions.
| tikkun wrote:
| Intersting.
|
| So engineers that like to iterate and explore are more likely
| to like LLMs.
|
| Whereas engineers that like have a more rigid specific
| process are more likely to dislike LLMs.
| godelski wrote:
| I frequently iterate and explore when writing code. Code
| gets written multiple times before being merged. Yet, I
| still haven't found LLMs to be helpful in that way. The
| author gives "autocomplete", "search", and "chat-driven
| programming" as 3 paradigms. I get the most out of search
| (though a lot of this is due to the decreasing value of
| Google), autocomplete is pretty weak to me especially as I
| macro or just use contextual complete, and I've failed
| miserably at chat-driven programming on every attempt. I
| spend more time debugging the AI than it would to debug
| myself. Albeit it __feels__ faster because I'm doing more
| typing + waiting rather than continuous thinking (but the
| latter has extra benefits).
| erosivesoul wrote:
| FWIW I find LLMs almost useless for writing novel code.
| Like it can spit out a serviceable UUID generator when I
| need it, but try writing something with more than a layer
| or two of recursion and it gets confused. I turn copilot on
| for boilerplate and off for solving new problems.
| harrall wrote:
| I believe it's more that people hate trying new tools because
| they've already made their choice and made it their identity.
|
| However, there are also people who love everything new and
| jump onto the latest hype too. They try new things but then
| immediately advocate it without merit.
|
| Where are the sane people in the middle?
| dns_snek wrote:
| As an experienced software developer, I paid for ChatGPT
| for a couple of months, I trialed Gemini Pro for a couple
| of months, and I've used the current version of Claude.
|
| I'd be happy if LLMs could produce working code as often
| and as quickly as the evangelist claim, but whenever I try
| to use LLM to work on my day to day tasks, I almost always
| walk away frustrated and disappointed - and most of my work
| is boring on technical merits, I'm not writing novel comp-
| sci algorithms or cryptography libraries.
|
| Every time I say this, I'm painted as some luddite who just
| hates change when the reality is that no, current LLMs are
| just not fit for many of the purposes they're being
| evangelized for. I'd love nothing more than to be a 2x
| developer on my side projects, but it just hasn't happened
| and it's not for the lack of trying or open mindedness.
|
| edit: I've never actually seen any LLM-driven developers
| work in real time. Are there any live coding channels that
| could convince the skeptics what we're missing out on
| something revolutionary?
| harrall wrote:
| You're the middle ground I was talking about. You tried
| it. You know where it works and where it doesn't.
|
| I've used LLM to generate code samples and my IDE
| (IntelliJ) uses an LLM for auto-suggestions. That's
| mostly about it for me.
| davepeck wrote:
| I see less "painting as a luddite" in response to
| statements like this, and more... surprise. Mild
| skepticism, perhaps!
|
| Your experience diverges from that of other experienced
| devs who have used the same tools, on probably similar
| projects, and reached different conclusions.
|
| That includes me, for what it's worth. I'm a graybeard
| whose current work is primarily cloud data pipelines that
| end in fullstack web. Like most devs who have fully
| embraced LLMs, I don't think they are a magical panacea.
| But I've found many cases where they're unquestionably an
| accelerant -- more than enough to justify the cost.
|
| I don't mean to say your conclusions are wrong. There
| seems to be a bimodal distribution amongst devs. I
| suspect there's something about _how_ these tools are
| used by each dev, and in the specific
| circumstances/codebases/social contexts, that leads to
| quite different outcomes. I would _love_ to read a better
| investigation of this.
| efnx wrote:
| I think it also depends on _what_ the domain is, and also
| to a certain degree the tools / stack you use. LLMs
| aren't coherent or correct when working on novel
| problems, novel domains or using novel tools.
|
| They're great for doing something that has been done
| before, but their hallucinations are wildly incorrect
| when novelty is at play - and I'll add they're always
| very authoritative! I'm glad my languages of choice have
| a compiler!
| davepeck wrote:
| Yeah, absolutely.
|
| LLMs work best for code when both (a) there's sufficient
| relevant training data aka we're not doing something
| particularly novel and (b) there's sufficient context
| from the current codebase to pick up expected patterns,
| the peculiarities of the domain models, etc.
|
| Drop (a) and get comical hallucinations; drop (b) and
| quickly find that LLMs are deeply mediocre at top-level
| architectural and framework/library choices.
|
| Perhaps there's also a (c) related to precision. You can
| write code to issue a SQL query and return JSON from an
| API endpoint in multiple just-fine ways. Misplace a
| pthread_mutex_lock, however, and you're in trouble. I
| certainly don't trust LLMs to get things like this right!
|
| (It's worth mentioning that "novelty" is a tough concept
| in the context of LLM training data. For instance, maybe
| nobody has implemented a font rasterizer in Rust before,
| but plenty of people have written font rasterizers and
| plenty of others have written Rust; LLMs seem quite good
| at synthesizing the two.)
| jpc0 wrote:
| My recent example for where its helpful.
|
| Pretty nice at autocomplete. Like writing json tags in go
| structs. Can just autocomplete that's stuff for me no
| problem, it saved me seconds per line, seconds I tell
| you.
|
| It's stupid as well... Autofilled a function, looks
| correct. Reread it 10 minutes later and well... Minor
| mistake that would have caused a crash at runtime. It
| looked correct but in reality it just didn't have enough
| context ( the context is in an external doc on my second
| screen ... ) and there was no way it would ever have
| guessed the correct code.
|
| It took me longer to figure out why the code looked wrong
| than if I had just typed it myself.
|
| Did it speed up my workflow on code I could have given a
| junior to write? Not really, but some parts were quicker
| while other were slower.
|
| And imagine if that code bad crashed in production next
| week instead of right now while the whole context is
| still in my head. Maybe that would be hours of debugging
| time...
|
| Maybe as parent said, for a domain where you are braking
| new ground, it can generate some interesting ideas you
| wouldn't have thought about. Like a stupid pair that can
| get you out if a local manima but in general doesn't help
| much it can be a significant help.
|
| But then again you could do what has been done for
| decades and speak to another human about the problem, at
| least they may have signed the same NDA as you...
| holoduke wrote:
| Yesterday i wanted to understand what a team was doing in
| a go project. I have never really touched go before. I do
| understand software, because I develop for plus 20 years.
| But chatgpt was perfectly able to give me a summary on
| how the implementation worked. Gave me examples and
| suggestions. And within a day fulltime pasting code and
| asking question i had a good understanding of the
| codebase. It would have be a lot more difficult with only
| google.
| twelve40 wrote:
| how often do you get to learn an unfamiliar language? is
| it something you need to do every day? so this use case,
| did it save you much time overall?
| NoOn3 wrote:
| I have very similar experience. For me LLM are good at
| explaining someone else's complex code, but for some
| reason they don't help me write new code well. I would
| also like to see any LLM-driven developers work in real
| time.
| HappMacDonald wrote:
| My experience thus far is that LLMs can be quite good at:
|
| * Information lookup
|
| -- when search engines are enshittified and bogged down
| by SEO spam and when it's difficult to transform a
| natural language request into a genuinely unique set of
| search keywords
|
| -- Search-enabled LLMs have the most up to date reach in
| these circumstances but even static LLMs can work in a
| pinch when you're searching for info that's probably well
| represented in their training set before their knowledge
| cutoff
|
| * Creatively exploring a vaguely defined problem space
|
| -- Especially when one's own head feels like it's too
| full of lead to think of anything novel
|
| -- Watch out to make sure the wording of your request
| doesn't bend the LLM too far into a stale direction. For
| example naming an example can make them tunnel vision
| onto that example vs considering alternatives to it.
|
| * Pretending to be Stack Exchange
|
| -- EG, the types of questions one might pose on SE one
| can pose to an LLM and get instant answers, with less
| criticism for having asked the question in the first
| place (though Claude is apparently not above gently
| checking in if one is encountering an X Y problem) and
| often the LLM's hallucination rate is no worse than that
| of other SE users
|
| * Shortcut into documentation for tools with either thin
| or difficult to navigate docs
|
| -- While one must always fact-check the LLM, doing so is
| usually quicker in this instance than fishing online for
| which facts to even check
|
| -- This is most effective for tools where tons of people
| do seem to already know how the tool works (vs tools
| nobody has ever heard of) but it's just not clear how
| they learned that.
|
| * Working examples to ice-break a start of project
|
| * Simple automation scripts with few moving parts,
| especially when one is particular about the goal and the
| constraints
|
| -- Online one might find example scripts that _almost_
| meet your needs but always fail to meet them in some
| fashion that 's irritating to figure out how to coral
| back into your problem domain
|
| -- LLMs have deep experience with tools and with _short_
| snippets of coherent code, so their success rate on
| utility scripts are much higher than on "portions of
| complex larger projects".
| edanm wrote:
| Totally respect your position, given that you actually
| _tried_ the tool and found it didn 't work for you. That
| said, one valid explanation is that the tool isn't good
| for what you're trying to achieve. But an alternative
| explanation is that you haven't learned how to use the
| tool effectively.
|
| You seem open to this possibility, since you ask:
|
| > I've never actually seen any LLM-driven developers work
| in real time. Are there any live coding channels that
| could convince the skeptics what we're missing out on
| something revolutionary?
|
| I don't know many yet, but Steve Yegge, a fairly famous
| developer in his own right, has been talking about this
| for the last few months, and has walked a few people
| through his "Chat Oriented Programming" (CHOP) ideas. I
| believe if you search for that phrase, you'll find a few
| videos, some from him and some from others. Can't
| guarantee they're all quality videos, though anything
| Steve himself does is interesting, IMO.
| evilfred wrote:
| Middle Ground Fallacy
| harrall wrote:
| Fallacy fallacy
| goatlover wrote:
| The middle ground between hyping the new tech and being
| completely skeptical about it is usually right. New tech
| is usually not everything it's hyped up to be, but also
| usually not completely useless or bad for society. It's
| likely we're not about to usher in the singularity or
| doom society, but LLMs are useful enough to stick around
| in various tools. Also it's probably the case that a
| percentage of they hype is driven by wanting funding.
| oblio wrote:
| > New tech is usually not everything it's hyped up to be,
| but also usually not completely useless or bad for
| society.
|
| Except for cryptocurrencies (at least their ratio of
| investments to output) :-p
| wvenable wrote:
| > Where are the sane people in the middle?
|
| They are the quiet ones.
| jrockway wrote:
| Yup! I don't have a lot to say about LLMs for coding.
| There are places where I'm certain they're useful and
| that's where I use them. I don't think "generate a react
| app from scratch" helps me, but things like "take a CPU
| profile and write it to /tmp/pprof.out" have worked well.
| I know how to do the latter, but would need to look at
| the docs for the exact function name to call, and the LLM
| just knows and checks the error on opening the file and
| all that tedium. It's helpful.
|
| At my last job I spent a lot of time on cleanups and
| refactoring and never got the LLM to help me in any way.
| This is the thing that I try every few months and see
| what's changed, because one day it will be able to do the
| tedious things I need to get done and spare me the
| tedium.
|
| Something I should try again is having the LLM follow a
| spec and see how it does. A long time ago I wrote some
| code to handle HTTP conditional requests. I pasted the
| standard into my code, and wrote each chunk of code in
| the same order as the spec. I bet the LLM could just do
| that for me; not a lot of knowledge of code outside that
| file was required, so you don't need many tokens of
| context to get a good result. But alas the code is
| already written and works. Maybe if I tried doing that
| today the LLM would just paste in the code I already
| wrote and it was trained on ;)
| travisporter wrote:
| > I got into this profession simply because I could Ctrl-Z to
| the previous step much more easily than my then favourite
| chemical engineering goals.
|
| That is interesting. Asking as a complete ignoramus - is
| there not a way to do this now? Like start off with a 100 of
| reagent and at every step use a bit and discard if wrong
| ssivark wrote:
| But for every step that turns out to be "correct" you now
| have to go back and redo that in your held-out sample
| anyways. So it's not like you get to save on repeating the
| work -- IIUC you just changed it from depth-first execution
| order to breadth-first execution order.
| Vampiero wrote:
| > International Islamic University Chittagong
|
| ??? What's up with native English speakers and random
| acronyms of stuff that isn't said that often? YMMV, IIUC,
| IANAL, YSK... Just say it and save everyone else a google
| search.
| HappMacDonald wrote:
| So just to make sure I'm on the same page: you're
| bemoaning how commonly people abbreviate uncommon
| sayings?
| Vampiero wrote:
| I'm bemoaning the fact that I have to google random
| acronyms every time an American wants to say the most
| basic shit as if everyone on the internet knows their
| slang and weird four letter abbreviations
|
| And googling those acronyms usually returns unrelated
| shit unless you go specifically to urban dictionary
|
| And then it's "If I understand correctly". Oh. Of course.
| He couldn't be arsed to type that
| amenhotep wrote:
| FWIW IMO YTA
| edgineer wrote:
| frfr
| tmtvl wrote:
| I'm not a native English speaker, but IIUC is clearly 'If
| I Understand Correctly'. If you look at the context it's
| often fairly easy to figure out what an initialism means.
| I mean even I can usually deduce the meaning and I'm
| barely intelligent enough to qualify as 'sentient'.
| numpad0 wrote:
| That likely ends up with 100 failed results all attributed
| to the same set of causes
| dboreham wrote:
| Calculators vs slide rules.
| numpad0 wrote:
| I can't relate to this comment at all. Doesn't feel like
| what's said in GP either.
|
| IMO, LLMs are super fast predictive input and hallucinatory
| unzip; files to be decompressed don't have to exist yet, but
| input has to be extremely deliberate and precise.
|
| You have to have a valid formula that gives the resultant
| array that don't require no more than 100 IQ to comprehend,
| and then they unroll it for you into the whole code.
|
| They don't reward trial and error that much. They don't seem
| to help outsiders like 3D printers did, either. It is indeed
| a discriminatory tool as in it mistreats amateurs.
|
| And, by the way, it's also increasingly obvious to me that
| assuming pro-AI posture more than what you would from purely
| rational and utilitarian standpoint triggers a unique mode of
| insanity in humans. People seem to contract a lot of
| negativity doing it. Don't do that.
| CraigJPerry wrote:
| >> where all you wrote was function signatures and high-level
| control flow, and maybe some conformance tests around them
|
| AIUI that's where idris is headed
| greenyouse wrote:
| That approach sounds similar to the Idris programming language
| with Type Driven Development. It starts by planning out the
| program structure with types and function signatures. Then the
| function implementation (aka holes) can be filled in after the
| function signatures and types are set.
|
| I feel like this is a great approach for LLM assisted
| programming because things like types, function signatures,
| pre/post conditions, etc. give more clarity and guidance to the
| LLM. The more constraints that the LLM has to operate under,
| the less likely it is to get off track and be inconsistent.
|
| I've taken a shot at doing some little projects for fun with
| this style of programming in TypeScript and it works pretty
| well. The programs are written in layers with the domain
| design, types, schema, and function contracts being figured out
| first (optionally with some LLM help). Then the function
| implementations can be figured out towards the end.
|
| It might be fun to try Effect-TS for ADTs + contracts + compile
| time type validation. It seems like that locks down a lot of
| the details so it might be good for LLMs. It's fun to play
| around with different techniques and see what works!
| lysecret wrote:
| 100% this is what I do in python too!
| brabel wrote:
| I am not a genius but have a couple of decades experience and
| finally started using LLMs in anger in the last few weeks. I
| have to admit that when my free quota from GitHub Copilot ran
| out (I had already run out of Jetbrains AI as well!! Our
| company will start paying for some service as the trials have
| been very successful), I had a slight bad feeling as my
| experience was very similar to OP: it's really useful to get me
| started, and I can finish it much more easily from what the AI
| gives me than if I started from scratch. Sometimes it just
| fills in boilerplate, other times it actually tells me which
| functions to call on an unfamiliar API. And it turns out it's
| really good at generating tests, so it makes my testing more
| comprehensive as it's so much faster to just write them out
| (and refine a bit usually by hand). The chat almost completely
| replaced my StackOverflow queries, which saves me much time and
| anxiety (God forbid I have to ask something on SO as that's a
| time sink: if I just quickly type out something I am just
| asking to be obliterated by the "helpful" SO moderators... with
| the AI, I just barely type anything at all, leave it with typos
| and all, the AI still gets me!).
| EagnaIonat wrote:
| Have you tried using Ollama? You can download and run an LLM
| locally on your machine.
|
| You can also pick the right model for the right need and it's
| free.
| mentos wrote:
| I'm using ChatGPT4o to convert a C# project to C++. Any
| recommendation on what Ollama model I could use instead?
| neonsunset wrote:
| The one that does not convert C# at all and asks you to
| just optimize it in C# instead (and to use the
| appropriate build option) :D
| mentos wrote:
| I'm converting game logic from C# to UE5 C++. So far made
| great progress using ChatGPT4o and o1
| neonsunset wrote:
| Do you find these working out better for you than Claude
| 3.5 Sonnet? So far I've not been a fan of the ChatGPT
| models' output.
| mentos wrote:
| I find ChatGPT better with UE4/5 C++ but they are very
| close.
|
| Biggest advantage is the o1 128k context. I can one shot
| an entire 1000 line class where normally I'd have to go
| function by function with 4o.
| brabel wrote:
| Yes. If the AI is not integrated with the IDE, it's not as
| helpful. If there were an IDE plugin that let you use a
| local model, perhaps that would be an option, but I haven't
| seen that (Github Copilot allows selecting different
| models, but I didn't check more carefully whether that also
| includes a local one, anyone knows?).
| oogali wrote:
| It's doable as it's what I use to experiment.
|
| Ollama + CodeGPT IntelliJ plugin. It allows you to point
| at a local instance.
| mark_l_watson wrote:
| I also use Ollama for coding. I have a 32G M2 Mac, and
| the models I can run are very useful for coding and
| debugging, as well as data munging, etc. That said,
| sometimes I also use Claude Sonnet 3.5 and o1. (BTW, I
| just published an Ollama book yesterday, so I am a little
| biassed towards local models.)
| matrix12 wrote:
| Thanks for the book!
| bpizzi wrote:
| > (Github Copilot allows selecting different models, but
| I didn't check more carefully whether that also includes
| a local one, anyone knows?).
|
| To my knowledge, it doesn't.
|
| On Emacs there's gptel which integrates quiet nicely
| different LLM inside Emacs, including a local Ollama.
|
| > gptel is a simple Large Language Model chat client for
| Emacs, with support for multiple models and backends. It
| works in the spirit of Emacs, available at any time and
| uniformly in any buffer.
|
| https://github.com/karthink/gptel
| th4t1sW13rd wrote:
| This can use Ollama: https://www.continue.dev/
| devjab wrote:
| I'm genuinely curious but what did you use StackOverflow for
| before? With a couple of decades in the industry I can't
| remember when the last time I "Google programmed" anything
| was. I always go directly to the documentation for whatever
| it is I'm working for, because where else would I find out
| how it actually works? It's not like I haven't "Google
| programmed" when I was younger, but it's just such a slow
| process based on trusting strangers on the internet that it
| never really made much sense once I started knowing what I
| was doing. I sort of view LLM's in a similar manner. Why
| would you go to them rather than the actual documentation? I
| realize this might sound arrogant or rude, and I really hope
| you believe me when I say that I don't mean it like this. The
| reason I'm curious is because we're really struggling getting
| junior developers to not look, everywhere, but the
| documentation first. Which means they often actually don't
| know how what they build works. Which can be an issue when
| they load every object of a list into memory isntead of using
| a generator...
|
| As far as using LLMs in anger I would really advice anyone to
| use them. GitHub copilot hasn't been very useful for me
| personally, but I get a lot of value out of running my
| thought process by a LLM. I think better when I "think out
| loud" and that is obviously challenging when everyone is
| busy. Running my ideas by an LLM helps me process them in a
| similar (if not better) fashion, often it won't even really
| matter what the LLM conjures up because simply describing
| what I want to do often gives me new ideas, like "thinking
| out loud".
|
| As far as coding goes. I find it extremely useful to have
| LLMs write cli scripts to auto-generate code. The code the
| LLM will produce is going to be absolute shite, but that
| doesn't matter if the output is perfectly fine. It's reduced
| my personal reliance on third party tools by quite a lot.
| Because why would I need a code generator for something (and
| in that process trust a bunch of 3rd party libraries) when I
| can have a LLM write a similar tool in half an hour?
| wiseowise wrote:
| > Why would you go to them rather than the actual
| documentation?
|
| Not every documentation is made equal. For example: Android
| docs are royal shit. They cover some basic things, e.g.
| show a button, but good look finding esoteric Bluetooth
| information or package management, etc. Most of it is a mix
| of experimentation and historical knowledge (baggage).
| devjab wrote:
| > Not every documentation is made equal.
|
| They are wildly different. I'm not sure the Android API
| reference is that bad, but that is mainly because I've
| spent a good amount years with the various .Net API
| references and the Android one is a much more shiny turd
| than those. I haven't had issues with Bluetooth myself,
| the Bluetooth SIG has some nice specification PDF's but I
| assume you're talking about the ones which couldn't be
| found? I mean this in a "they don't seem to exist" kind
| of way and not that, you specifically, couldn't find
| them.
|
| I agree though. It's just that I've never really found
| internet answers to be very useful. I did actually search
| for information a few years back when I had to work with
| a solar inverter datalogger, but it turned out that
| having the ridicilously long German engineering manual
| scanned, OCR processed and translated was faster. Anyway,
| we all have our great white whales. I'm virtually
| incapable of understanding the SQLAlchemy documentation
| as an example, luckily I'll probably never have to use it
| again.
| brabel wrote:
| I believe you don't mean to be rude, but you just sound
| completely naive to me. To think that documentation
| includes everything is just, like, have you actually been
| coding anything at all that goes just slightly off the
| happy path? Example from yesterday: I have a modular JavaFX
| application (i.e. it uses Java JMS modules, not just
| Maven/Gradle modules). I introduced a call to `url()` in
| JavaFX CSS. That works when running using the classpath,
| but not when using the module path. I spent half an hour
| reading docs to see what they say about modular
| applications. They didn't mention anything at all.
| Specially because in my case, I was not just doing
| `getClass().getResource`... I was using the CSS directive
| to load a resource from the jar. This is exactly when I
| would likely go on SO and ask if anyone had seen this
| before. It used to be highly likely someone who's an expert
| on JavaFX would see and answer my question, sometimes even
| people who directly worked on JavaFX!
|
| StackOverflow was not really meant for juniors, as juniors
| usually can indeed find answers on documentation, normally.
| It was, like ExpertsExchange before it, a place for
| veterans to exchange tribal knowledge like this. If you
| think only juniors use SO, you seem to have arrived at the
| scene just yesterday and just don't know what you're
| talking about.
| ilrwbwrkhv wrote:
| Being a dev at a large company is usually the sign that you're
| not very good though. And anyone can start a company with the
| right connections.
| ksenzee wrote:
| You've just disproved your own assertion. Either that or you
| believe everyone who's any good has the right connections.
| tomwojcik wrote:
| That's a terrible blanket statement, very US-centric. Not
| everyone wants to start a company and you can't just reduce
| ones motivations to your measure of success.
| joseda-hg wrote:
| God knows many of the best devs I've known would be an
| absolute nightmare on the business side, they'd rather have
| a capable business person if they could avoid it
| benterix wrote:
| > designed around filling in the implementations for you. 20
| years ago that would have been from a live online database
|
| This reminds me a bit of PowerBuilder (or was it
| PowerDesigner?) from early 1990s. They sold it to SAP later, I
| was told it's still being used today.
| antirez wrote:
| I have also many years of programming experience and find
| myself strongly "accelerated" by LLMs when writing code. But,
| if you think at it, it makes sense that many seasoned
| programmers are using LLMs better. LLMs are a helpful tool, but
| also a hard-to-use tool, and in general it's fair to think that
| better programmers can do a better use of some assistant (human
| or otherwise): better understanding its strengths, identifying
| faster the good and bad output, providing better guidance to
| correct the approach...
|
| Other than that, what correlates more strongly with the ability
| to use LLMs effectively is, I believe, language skills: the
| ability to describe problems very clearly. LLMs reply quality
| changes very significantly with the quality of the prompt.
| Experienced programmers that can _also_ communicate effectively
| provide the model with many design hints, details where to
| focus, ..., basically escaping many local minima immediately.
| bsenftner wrote:
| Communication skills are the keys to using LLMs. Think about
| it: every type of information you want is in them, in fact it
| is there multiple times, with multiple levels of seriousness
| in the treatment of the idea. If one is casual in their
| request, using casual language, then the LLM will reply with
| a casual reply because that matched your request best. To get
| a hard, factual answer from those that are experts in a
| subject, use the formal term, use the expert's language and
| you'll get back a rely more likely to be correct because it's
| in the same level of formal treatment as correct answers.
| psychoslave wrote:
| >every type of information you want is in them
|
| Actually, I'm afraid that no. It won't give us the step by
| step scalable processes to make humanity as a whole enter
| in a loop of indefinitely long period of world peace, with
| each of us enjoying life in its own thriving manner. That
| would be great information to broadcast, though.
|
| Also it equally has ability to produce large pile of
| completely delusional answers, that mimics just as well
| genuinely sincere statements. Of course, we can also
| receive that kind of misguiding answers from humans. But
| the amount of output that mere humans can throw out in such
| a form is far more limited.
|
| All that said, it's great to be able to experiment with it,
| and there are a lot of nice and fun things to do with it.
| It can be a great additional tool, but it won't be a self-
| sufficient panacea of information source.
| bsenftner wrote:
| > It won't give us the step by step scalable processes to
| make humanity as a whole enter in a loop of indefinitely
| long period of world peace
|
| That's not anywhere, that's a totally unsolved and open
| ended problem, why would you think an LLM would have
| that?
| fmbb wrote:
| If what you meant was
|
| > Think about it: every type of already solved problem
| you want information about is in them, in fact it is
| there multiple times, with multiple levels of seriousness
| in the treatment of the idea.
|
| then that was not clear from your comment saying LLMs
| contain any information you want.
|
| One has to be careful communicating about LLms because
| the world is full of people that actually believe LLMs
| are generally intelligent super beings.
| numpad0 wrote:
| I think GP's saying that it must be in your prompt, not
| in the weights.
|
| If you want LLM make sandwich, you have to tell them you
| `want triangular sandwiches of standard serving size made
| with white bread and egg based filling`, not `it's almost
| noon and I'm wondering if sandwich for lunch is a good
| idea`. Fine-tuning partially solves that problem but they
| still like the former.
| arminiusreturns wrote:
| After a small prompt engineering:
| https://0bin.net/paste/zolMrjVz#dgZrZzKU-
| PlxdkJTdG0pZU9bsCM3...
| psychoslave wrote:
| Interesting, thanks for sharing. Could you also give some
| insights on the process you followed?
| arminiusreturns wrote:
| Sure. Lately I've found that the "role" part of prompt
| engineering seems to be the most important. So what I've
| been doing is telling ChatGPT to play the role of _the
| most educated /wise/knowledgeable/skilled $field
| $role(advisor, lawyer, researcher etc) in the history of
| the world_ and then giving it some context for the task
| before asking for the actual task.
|
| Sometimes asking it to self reflect on how the prompt
| itself could be better engineered helps if the initial
| response isn't quite right.
| mhalle wrote:
| I completely agree that communication skills are critical in
| extracting useful work or insight from LLMs. The analogy for
| communicating with people is not far-fetched. Communicating
| successfully with a specific person requires an understanding
| of their strengths and weaknesses, their tendencies and blind
| spots. The same is true for communicating with LLMs.
|
| I have actually found that from a documentation point of
| view, querying LLMs has made me better and explaining things
| to people. If, given the documentation for a system or API, a
| modern LLM can't answer specific questions about how to
| perform a task, a person using the same documentation will
| also likely struggle. It's proving to be a good way to test
| the effectiveness of documentation, for humans and for LLMs.
| LouisSayers wrote:
| > the ability to describe problems very clearly
|
| Yes, and to provide enough context.
|
| There's probably a lot that experience is contributing to the
| interaction as well, for example - knowing when the LLM has
| gone too far, focusing on what's important vs irrelevant to
| the task, modularising and refactoring code, testing etc
| gen220 wrote:
| Hey! Asking because I know you're a fellow vimmer [0]. Have
| you integrated LLMs into your editor/shell? Or are you
| largely copy-pasting context between a browser and vim? This
| context-switching of it all has been a slight hang-up for me
| in adopting LLMs. Or are you asking more strategic questions
| where copy-paste is less relevant?
|
| [0] your videos on writing systems software were part of what
| inspired me to make a committed switch into vim. thank you
| for those!
| qup wrote:
| You want aider.
| rudiksz wrote:
| > "seasoned programmers are using LLMs better".
|
| I do not remember a single instance when code provided to me
| by an LLM worked at all. Even if I ask something small that
| cand be done in 4-5 lines of code is always broken.
|
| From a fellow "seasoned" programmer to another: how the hell
| do you write the prompts to get back correct working code?
| jkaptur wrote:
| The story from the article matches my experience. The LLM's
| first answer is often a _little_ broken, so I tweak it
| until it 's actually correct.
| numpad0 wrote:
| dc: not a seasoned dev, with <b> and <h1> tags on "not".
|
| They can't think for you. All intelligent thinking you have
| to do.
|
| First, give them high level requirement that can be
| clarified into indented bullet points that looks like code.
| Or give them such list directly. Don't give them half-open
| questions usually favored by talented and autonomous
| individuals.
|
| Then let them further decompress that pseudocode bullet
| points into code. They'll give you back code that resemble
| a digitized paper test answer. Fix obvious errors and you
| get a B grade compiling code.
|
| They can't do non-conventional structures, Quake style
| performance optimized codes, realtime robotics, cooperative
| multithreading, etc., just good old it takes what it takes
| GUI app API and data manipulation codes.
|
| For those use cases with these points in mind, it's a lot
| faster to let LLM generate tokens than typing `int
| this_mandatory_function_does_obvious (obvious *obvious){
| ...` manually on a keyboard. That should arguably be a
| productivity boost in the sense that the user of LLM is
| effectively typing faster.
| HappMacDonald wrote:
| I'd ask things like "which LLM are you using", and "what
| language or APIs are you asking it to write for".
|
| For the standard answers of "GPT-4 or above", "claude
| sonnet or haiku", or models of similar power and well known
| languages like Python, Javascript, Java, or C and assuming
| no particularly niche or unheard of APIs or project
| contexts the failure rate of 4-5 line of code scripts in my
| experience is less than 1%.
| wvenable wrote:
| I rarely get back not working code but I've also
| internalized it's limitations so I no longer ask it for
| things it's not going to be able to do.
|
| As other commenters have pointed it, there also a lot of
| variation between different models and some are quite dumb.
|
| I've had no issues with 10-20 line coding problems. I've
| also had it built a lot of complete shell scripts and had
| no problem there either.
| antirez wrote:
| Check my YouTube channel if you have a few minutes. I just
| published a video about adding a complex feature (UTF-8) to
| the Kilo editor, using Claude.
| mordymoop wrote:
| I write the prompt as if I'm writing an email to a
| subordinate that clearly specifies what the code needs to
| do.
|
| If what I'm requesting an improvement to an existing code,
| I paste the whole code if practical, or if not, as much of
| the code as possible, as context before making request for
| additional functionality.
|
| Often these days I add something like "preserve all
| currently existing functionality." Weirdly, as the models
| have gotten smarter, they have also gotten more prone to
| delete stuff they view as unnecessary to the task at hand.
|
| If what I'm doing is complex (a subjective judgement) I ask
| it to lay out a plan for the intended code before starting,
| giving me a chance to give it a thumbs up or clarify its
| understanding of what I'm asking for if it's plan is off
| base.
| kragen wrote:
| That's really interesting. What are the most important things
| you've learned to do with the LLMs to get better results?
| What do your problem descriptions look like? Are you going
| back and forth many times, or crafting an especially-high-
| quality initial prompt?
| antirez wrote:
| I'm posting a set of videos on my YT channel where I'll
| show the process I follow. Thanks!
| kragen wrote:
| That's fantastic! I thought about asking if you had
| streamed any of it, but I didn't want to sound demanding
| and entitled :)
| ignoramous wrote:
| > _[David, Former staff engineer at Google ... CTO of
| Tailscale,] doesn 't need LLMs. That he says LLMs make him more
| productive at all as a hands-on developer, especially around
| first drafts on a new idea, means a lot to me..._
|
| Don't doubt for a second the pedigree of founding engs at
| Tailscale, but David is careful to point out exactly why LLMs
| work for them (but might not for others): I am
| doing a particular kind of programming, product development,
| which could be roughly described as trying to bring programs to
| a user through a robust interface. That means I am building a
| lot, throwing away a lot, and bouncing around between
| environments. Some days I mostly write typescript, some days
| mostly Go. I spent a week in a C++ codebase last month
| exploring an idea, and just had an opportunity to learn the
| HTTP server-side events format. I am all over the place,
| constantly forgetting and relearning. If you spend
| more time proving your optimization of a cryptographic
| algorithm is not vulnerable to timing attacks than you do
| writing the code, I don't think any of my observations here are
| going to be useful to you.
| pplonski86 wrote:
| I'm in similar situations, I jump between many environments,
| mainly between Python and Typescript, however, currently
| testing a new idea of learning algorithm in C++, and I simply
| don't always remember all syntax. I was very skeptical about
| LLMs at first. Now, I'm using LLMs daily. I can focus more on
| thinking rather than searching stackoverflow. Very often I
| just need simple function, that it is much faster to create
| with chat.
| JKCalhoun wrote:
| And if anyone remembers: before Stack Overflow you more or
| less had to specialize in a domain, become good using a
| handful of frameworks/API, on one platform. Learning a new
| language, a new API (god forbid a new platform) was to
| sail, months long, into seas unknown.
|
| In this regard, with first Stack Overflow and now LLMs, the
| field has improved mightily.
| big_youth wrote:
| > If you spend more time proving your optimization of a
| cryptographic algorithm is not vulnerable to timing attacks
| than you do writing the code, I don't think any of my
| observations here are going to be useful to you.
|
| I am not a software dev I am a security researcher. LLM's are
| great for my security research! It is so much easier and
| faster to iterate on code like fuzzers to do security
| testing. Writing code to do a padding oracle attack would
| have taken me a week+ in the past. Now I can work with an LLM
| to write code and learn and break within the day.
|
| It has accelerated my security research 10 fold, just because
| I am able to write code and parse and interpret logs at a
| level above what I was able to a few years ago.
| Vox_Leone wrote:
| I have been using LLM to generate functional code from *pseudo-
| code* with excellent results. I am starting to experiment with
| UML diagrams, both with LLM and computer vision to actually
| generate code from UML diagrams; for example a simple activity
| diagram could be the prompt on LLM 's, and might look like:
|
| Start -> Enter Credentials -> Validate -> [Valid] -> Welcome
| Message -> [Invalid] -> Error Message
|
| Corresponding Code (Python Example):
|
| class LoginSystem: def
| validate_credentials(self, username, password): if
| username == "admin" and password == "password":
| return True return False def
| login(self, username, password): if
| self.validate_credentials(username, password):
| return "Welcome!" else: return
| "Invalid credentials, please try again."
|
| *Edited for clarity
| jonvk wrote:
| This example illustrates one of the risks of using LLMs
| without subject expertise though. I just tested this with
| claude and got that exact same validation method back. Using
| string comparison is dangerous from a security perspective
| [1], so this is essentially unsafe validation, and there was
| no warning in the response about this.
|
| 1. https://sqreen.github.io/DevelopersSecurityBestPractices/t
| im...
| jpc0 wrote:
| Are you talking about the timing based attacks on that
| website which fails miserably at rendering a useable page
| on mobile?
| jpc0 wrote:
| Could you add to the prompt that the password is stored in an
| sqlite database using argon2 for encryption, the encryption
| parameters are stored as environment variables.
|
| You would like it to avoid timing based attacks as well as
| dos attacks.
|
| It should also generate the functions as pure functions so
| that state is passed in and passed out and no side
| effects(printing to the console) happen within the function.
|
| Then also confirm for me that it has handled all error cases
| that might reasonably happen.
|
| While you are doing that, just think about how much implicit
| knowledge I just had to type into the comment here and that
| is still ignoring a ton of other knowledge that needs to be
| considered like whether that password was salted before being
| stored. All the error conditions for the sqlite
| implementation in python, the argon2 implementation in the
| library.
|
| TLDR: that code is useless and would have taken me the same
| amount of time to write as your prompt.
| apwell23 wrote:
| he is using llm for coding. you don't become staff engineer by
| being a badass coder. Not sure how they are related.
| HarHarVeryFunny wrote:
| > His post reminds me of an old idea I had of a language where
| all you wrote was function signatures and high-level control
| flow
|
| Regardless of language, that's basically how you approach the
| design of a new large project - top down architecture first,
| then split the implementation into modules, design the major
| data types, write function signatures. By the time you are done
| what is left is basically the grunt work of implementing it
| all, which is the part that LLMs should be decent at,
| especially if the functions/methods are documented to level
| (input/output assertions as well as functionality) where it can
| also write good unit tests for them.
| dingnuts wrote:
| > the grunt work of implementing it all
|
| you mean the fun part. I can really empathize with digital
| artists. I spent twenty years honing my ability to write code
| and love every minute of it and you're telling me that in a
| few years all that's going to be left is PM syncs and OKRs
| and then telling the bot what to write
|
| if I'm lucky to have a job at all
| HarHarVeryFunny wrote:
| I think it depends on the size of the project. To me, the
| real fun of being a developer is the magic of being able to
| conceive of something and then conjure it up out of thin
| air - to go from an idea to reality. For a larger more
| complex project the major effort in doing this is the
| solution conception, top-down design (architecture), and
| design of data structures and component interfaces... The
| actual implementation (coding), test cases and debugging,
| then does become more like drudgework, not the most
| creative or demanding part of the project, other than the
| occasional need for some algorithmic creativity.
|
| Back in the day (I've been a developer for ~45 years!) it
| was a bit different as hardware constraints (slow 8-bit
| processors with limited memory) made algorithmic and code
| efficiency always a primary concern, and that aspect was
| certainly fun and satisfying, and much more a part of the
| overall effort than it is today.
| mahmoudimus wrote:
| Isn't that the idea behind UML? Which didn't work out so well,
| however, with the advent of LLMs today, I think that premise
| could work.
| agentultra wrote:
| It seems nice for small projects but I wouldn't use it for
| anything serious that I want to maintain long term.
|
| I would write the tests first and foremost: they are the
| specification. They're for future me and other maintainers to
| understand and I wouldn't want them to be generated: write them
| with the intention of explaining the module or system to another
| person. If the code isn't that important I'll write unit tests.
| If I need better assurances I'll write property tests at a
| minimum.
|
| If I'm working on concurrent or parallel code or I'm working on
| designing a distributed system, it's gotta be a model checker.
| I've verified enough code to know that even a brilliant human
| cannot find 1-in-a-million programming errors that surface in
| systems processing millions of transactions a minute. We're not
| wired that way. Fortunately we have formal methods. Maths is an
| excellent language for specifying problems and managing
| complexity. Induction, category theory, all awesome stuff.
|
| Most importantly though... you have to write the stuff and read
| it and interact with it to be able to keep it in your head.
| Programming is theory-building as Naur said.
|
| Personally I just don't care to read a bunch of code and play,
| "spot the error;" a game that's rigged for me to be bad at. It's
| much more my speed to write code that obviously has no errors in
| it because I've thought the problem through. Although I struggle
| with this at times. The struggle is an important part of the
| process for acquiring new knowledge.
|
| Though I do look forward to algorithms that can find proofs of
| trivial theorems for me. That would be nice to hand off...
| although simp does a lot of work like that already. ;)
| rafaelmn wrote:
| I disagree about search. While LLM can give you an answer faster,
| good doc (eg. MDN article in CSS example) will :
|
| - be way more reliable
|
| - probably be up to date on how you should solve it in
| latest/recommend approach
|
| - put you in a place where you can search for adjecent tech
|
| LLM with search has potential but I'd like if current tools are
| more oriented on source material rather than AI paraphrasing.
| cruffle_duffle wrote:
| One of my tricks is to paste the docs right into the context so
| the model can't fuck it up.
|
| Though I still wonder if that means I'm only tricking myself
| into thinking the LLM is increasing my productivity.
| rafaelmn wrote:
| I likr this approach. Read the docs, figure out what you
| want, get LLM to do the grunt work with all relevant context
| and review.
| EGreg wrote:
| I have found LLMs to be 95% useful on documented software, from
| everything eg Uniswap smart contracts to plugins in cordova to
| setting up Mac or Linux administrative tools.
|
| The problem for a regular person is that you have to copypasye
| from chat. That is "the last mile". For terminal commands
| that's fine but for programming you need a tool to automate
| this.
|
| Something like refactoring a function, given the entire
| context, etc. And it happening in the editor and you seeing a
| diff right away. The rest of the explanatory text should go
| next to the diff in a separate display.
|
| I bet someone can make a VSCode extension that chats with an
| LLM and does exactly this. The LLM is told to provide all the
| sections labeled clearly (code, explanation) and the editor
| makes the diff.
|
| Having said all that, good libraries that abstract away
| differences are far superior to writing code with an LLM. The
| only code that needs to be written is the interface and wiring
| up between the libraries.
| Ozzie_osman wrote:
| One mode I felt was missed was "thought partner", especially
| while debugging (aka rubber ducking).
|
| We had an issue recently with a task queue seemingly randomly
| stalling. We were able to arrive at the root cause much more
| quickly than we would have because of a back-and-forth
| brainstorming session with Claude, which involved describing the
| issue we were seeing, pasting in code from library to ask
| questions, asking it to write some code to add some missing
| telemetry, and then probing it for ideas on what might be going
| wrong. An issue that may have taken days to debug took about an
| hour to identify.
|
| Think of it as rubber ducking with a very strong generalist
| engineer who knows about basically any technical concepts.
| mmahemoff wrote:
| The new video and screen-share capabilities in ChatGPT and
| Gemini should make rubber-ducking smoother.
|
| I feel like I've worn out my computer's clipboard and alt-tab
| keys at this stage of the LLM experience.
| fragmede wrote:
| You may want to try any of the tools that can write to the
| filesystem so you're at least not copy pasting code from a
| chat window. CoPilot, Cursor, Aider, Tabnine, etc.
| vendiddy wrote:
| I found myself doing this with o1 recently for software
| architecture.
|
| I will evaluate design ideas with the model, express concerns
| on trade-offs, ask for alternative ideas, etc.
|
| Some of the benefit is having someone to talk to, but with
| proper framing it is surprisingly good at giving balanced
| takes.
| simondotau wrote:
| I've recently started using Cursor because it means I can now
| write python where two weeks ago I couldn't write python. It
| wrote the first pass of an API implementation by feeding it the
| PDF documentation. I've spent a few days testing and massaging it
| into a well formed, well structured library, pair-programming
| style.
|
| Then I needed to write a simple command line utility, so I wrote
| it in Go, even though I've never written Go before. Being able to
| make tiny standalone executables which do real work is
| incredible.
|
| Now if I ever need to write something, I can choose the language
| most suited to the task, not the one I happen to have the most
| experience with.
|
| That's a superpower.
| midasz wrote:
| But you're not really writing python right? You're instructing
| a tool to generate python. Kinda like saying I'm writing
| bytecode while I'm actually just typing Java.
| simondotau wrote:
| I am really writing python. The LLM is a substitute for
| having foreknowledge of this particular language's syntax and
| grammar, but I'm still debugging like a "real" programmer and
| I'm still editing/refining the code like a "real" programmer,
| because I am.
|
| Probably half the lines of code were written by me, because I
| do know how to write code.
|
| Here's what I wrote if you're curious:
| https://github.com/sjwright/zencontrol-python/
| yawnxyz wrote:
| > I could not go a week without getting frustrated by how much
| mundane typing I had to do before having a FIM model
|
| For those not in-the-know, I just learned today that code
| autocomplete is actually called "Fill-in-the-Middle" tasks
| Guthur wrote:
| Says who? I've been in the industry for nearly 25 years and
| have heard auto complete throughout but not once have I heard
| fill in the middle.
|
| Stop taking these blogs as oracle's of truth, they are not.
| These AI articles are full of this nonsense, to the point where
| it would appear to me many responses might just be Nvidia bots
| or whatever.
| sunaookami wrote:
| >I've been in the industry for nearly 25 years and have heard
| auto complete throughout but not once have I heard fill in
| the middle
|
| Then you need to look harder. FiM is a common approach for
| code generation LLMs.
|
| https://openai.com/index/efficient-training-of-language-
| mode...
|
| https://arxiv.org/abs/2207.14255
|
| This was before ChatGPT's release btw.
| Guthur wrote:
| Why, what was wrong with code completion, it was perfectly
| valid before even when including some sort of fuzzing.
|
| It's like everything to do with LLM marketing buzzword
| nonsense.
|
| I really want to just drop out of tech until all this
| obnoxious hype BS is gone.
| ascorbic wrote:
| Autocomplete is the feature, fill in the middle is one
| approach to implementing it. There are other ways to
| providing it (which were used in earlier versions of
| Copilot) and FIM can be used for tasks other than code
| completion.
| wruza wrote:
| It's just a term that signals "completion in between"
| rather than "after". Regular code completion usually
| doesn't take the following blocks into account mostly
| because these are grammatically vague due to an ongoing
| edit.
|
| Your comments may be sympathised to, but why on earth are
| they addressed to the root commenter. They simply shared
| their findings about an acronym.
| Guthur wrote:
| Because they mentioned it, why on earth would you think
| that is not a valid response in a thread that mentions
| it, from my observation that's pretty much how forum like
| threads work.
|
| More pressingly why do you think you should police it?
| wruza wrote:
| Apologies if my feedback annoyed you, it wasn't the goal.
| I just care about HN and this didn't feel right.
| crawshaw wrote:
| Author here.
|
| FIM is a term of art in LLM research for a style of tokens
| used to implement code completion. In particular, it refers
| to training an LLM with the extra non-printing tokens:
| <|fim_prefix|> <|fim_middle|> <|fim_suffix|>
|
| You would then take code like this: func
| add(a, b int) int { return <cursor> }
|
| and convert it to: <|fim_prefix|>func
| add(a, b int) int { return<|fim_suffix|>
| }<|fim_middle|>
|
| and have the LLM predict the next token.
|
| It is, in effect, an encoding scheme for getting the prefix
| and suffix into the LLM context while positioning the next
| token to be where the cursor is.
|
| (There are several variants of this scheme.)
| ripped_britches wrote:
| I'll say that the payoff for investing the time to learn how to
| do this right is huge. Especially with cursor which allows me to
| easily chat around context (docs, library files, etc)
| Aeolun wrote:
| I didn't believe it could be so good until I actually used it.
| It's a shame some of their models are proprietary because that
| means I can't use it for work. Would love if the thing worked
| purely with Copilot Chat (like Zed does), or if Zed added a
| similar composer mode.
| brabel wrote:
| What the author is asking about, a quick sketchpad where you can
| try out code quickly and chat with the AI, already exists in the
| JetBrains IDEs. It's called a scratch file[1].
|
| As far as I know, the idea of a scratch "buffer" comes from
| emacs. But in Jetbrains IDEs, you have the full IDE support even
| with context from your current project (you can pick the
| "modules" you want to have in context). Given the good
| integration with LLMs, that's basically what the author seems to
| want. Perhaps give GoLand[2] a try.
|
| Disclosure: no, I don't work for Jetbrains :D just a very happy
| customer.
|
| [1] https://www.jetbrains.com/help/idea/scratches.html
|
| [2] https://www.jetbrains.com/go/
| ryanobjc wrote:
| It's also available in emacs with packages like gptel which let
| you send the content of any buffer to your LLM of choice.
|
| I think emacs + LLM is a killer feature: the integration is
| super deep, deeper than any IDE I've seen, and it's just
| available... everywhere! Any text in emacs is sendable to a
| LLM.
| brabel wrote:
| I need to try that, but I have a feeling that in emacs it
| won't work as well because emacs has a bit more "trouble"
| setting up workspaces and using context only from that.
| Trying use use `project.el` now as it seems projectile has
| been superseded by it, if you know how to easily set that up
| with eglot support + AI would be helpful.
| justinl33 wrote:
| I've maintained several SDKs, and the 'cover everything' approach
| leads to nightmare dependency trees and documentation bloat. imo,
| the LLM paradigm shifts this even further - why maintain a
| massive SDK when users can generate precisely what they need?
| This could fundamentally change how we think about API
| distribution.
| golergka wrote:
| I have written a small fullstack app over the holidays, mostly
| with LLMs, to see how far would they get me. Turns out, they can
| easily write 90% of the code, but you still need to review
| everything, make the main architectural decisions and debug stuff
| when AI cant solve the bug after 2-3 iterations. I get a huge
| productivity boost and at the same time am not afraid that they
| will replace me. At least not yet.
|
| Can't recommend aider enough. I've tried many different coding
| tools, but they all seem like a leaky abstraction over LLMs
| medium of sequential text generation. Aider, on the other hand,
| leans into it in the best possible way.
| lysecret wrote:
| Funny, he starts of dismissing an AI IDE to end with building an
| AI IDE :D (Smells a little bit like not invented here syndrom)
| Otherwise fascinating article!
| cpursley wrote:
| I joke about once per month here that half of hn is basically
| "not invented here syndrome". And generally poor
| reimplementations of existing erlang features ;)
| bambax wrote:
| > _There are three ways I use LLMs in my day-to-day programming:
| 1 / Autocomplete 2/ Search 3/ Chat-driven programming_
|
| I do mostly 2/ Search, which is like a personalized Stack
| Overflow and sometimes feels incredible. You can ask a general
| question about a specific problem and then dive into some
| specific point to make sure you understand every part clearly.
| This works best for things one doesn't know enough about, but has
| a general idea of how the solution should sound or what it should
| do. Or, copy-pasting error messages from tools like Docker and
| have the LLM debug it for you really feels like magic.
|
| For some reason I have always disliked autocomplete anywhere, so
| I don't do that.
|
| The third way, chat-driven programming, is more difficult,
| because the code generated by LLMs can be large, and can also be
| wrong. LLMs are too eager to help, and they will try to find a
| solution even if there isn't one, and will invent it if
| necessary. Telling them in the prompt to say "I don't know" or
| "it's impossible" if need be, can help.
|
| But, like the author says, it's very helpful to get started on
| something.
|
| > _That is why I still use an LLM via a web browser, because I
| want a blank slate on which to craft a well-contained request_
|
| That's also what I do. I wouldn't like having something in the
| IDE trying to second guess what I write or suddenly absorbing
| everything into context and coming up with answers that it thinks
| make a lot of sense but actually don't.
|
| But the main benefit is, like the author says, that it lets one
| start afresh with every new question or problem, and save focused
| threads on specific topics.
| polotics wrote:
| My main usage is in helping me approach domains and tools I don't
| know enough to confidently know how best to get started.
|
| So one thing that doesn't get a mention in the article but is
| quite significant I think is the long lag of knowledge cutoff
| dates: looking at even the latest and greatest, there is one year
| or more of missing information.
|
| I would love for someone more versed than me to tell us how best
| to use RAG or LoRA to get the model to answer with fully up to
| date knowledge on libraries, frameworks, ...
| choeger wrote:
| Essentially, an LLM is a compressed database with a universal
| translator.
|
| So what we can get out of it is everything that has been written
| (and publicly released) before translated to any language it
| knows about.
|
| This has some consequences.
|
| 1. Programmers still need to know what algorithms or interfaces
| or models they want.
|
| 2. Programmers do not have to know a language very well anymore,
| to write code, but the have to for bug fixing. Consequently the
| rift between garbage software and quality software will grow.
|
| 3. New programming languages will face a big economical hurdle to
| take off.
| williamcotton wrote:
| _3. New programming languages will face a big economical hurdle
| to take off._
|
| I bet the opposite. I've written a number of DSLs and tooling
| around them over the last year as LLMs have allowed me to take
| on much bigger projects.
|
| I expect we see an explosion of languages over the next decade.
| klibertp wrote:
| Yes - the number of languages will grow, however, their
| _adoption_ will be much slower and harder to enact than now
| (and it 's already incredibly difficult).
|
| You might have written the DSLs, but the LLMs are unaware of
| this and will offer hallucinations when asked to generate
| code using that DSL.
|
| For the past few weeks I've been slowly getting back to
| Common Lisp. Even though there's plenty of CL code on the
| net, its volume is dwarfed by Python or JS. In effect, both
| Github Copilot and ChatGPT (4o) have an accuracy of 5%. I'm
| not kidding: they're unable to generate even very simple
| snippets correctly, hallucinating packages and functions.
|
| It's of course (I think?) possible to make a GPT specialized
| for Lisp, but if the generic model performs poorly, it'll
| probably make people wary and stay away from the language.
| So, unless you're ready to fine-tune a model for your
| language and somehow distribute it to your users, you'll see
| adoption rates dropping (from already minuscule ones!)
| stevage wrote:
| This is a great article with lots of useful insights.
|
| But I'm completely unconvinced by the final claim that LLM
| interfaces should be separate from IDE's, and should be their own
| websites. No thanks.
| dxuh wrote:
| Currently a lot of my work consists of looking at large, (to me)
| unknown code bases and figuring out how certain things work. I
| think LLMs are currently very bad at this and it is my
| understanding that there are problems in increasing context
| window sizes to multiple millions of tokens, so I wonder if LLMs
| will ever get good at this.
| AnnKey wrote:
| I would speculate that for learning unknown codebases, fine-
| tuning might work better than relying on context window size.
| jmull wrote:
| LLM auto-complete is good -- it suggests more of what I was going
| to type, and correctly (or close enough) often enough that it's
| useful. Especially in the boilerplate-y languages/code I have to
| use for $dayjob.
|
| Search has been neutral. For finding little facts it's been about
| the same as regular search. When digging in, I want
| comprehensive, dense, reasonably well-written reference
| documentation. That's not exactly wide-spread, but LLMs don't
| provide this either.
|
| Chat-driven generates too much buggy/incomplete code to be
| useful, and the chat interface is seriously clunky.
| Ygg2 wrote:
| > Search. If I have a question about a complex environment, say
| "how do I make a button transparent in CSS" I will get a far
| better answer asking any consumer-based LLM, than I do using an
| old fashioned web search engine.
|
| I don't think this is about LLMs getting better, but search
| becoming worse. In no small thanks to LLMs polluting the results.
| Do search images for terms and count how many are AI generated.
|
| I can say I got better result from Google X years ago vs Google
| of today.
| wizzard0 wrote:
| Google gets money from showing you ads, not because you pay
| them for quality search results.
|
| When you have to come over and over, and visit more pages to
| finally find what you needed, they get much more cash from
| advertisers than when you get everything instantly.
| EGreg wrote:
| Can't we just use test-driven development with AI Agents?
|
| 1) Idea
|
| 2) Tests
|
| 3) Code until all tests pass
| ianpurton wrote:
| I've been coding professionally for 30 years.
|
| I'm probably in the same place as the author, using Chat-GPT to
| create functions etc, then cut and pasting that into VSCode.
|
| I've started using cline which allows me to code using prompts
| inside VSCode.
|
| i.e. Create a new page so that users can add tasks to a tasks
| table.
|
| I'm getting mixed results, but it is very promising. I create a
| clinerules file which gets added to the system prompt so the AI
| is more aware of my architecture. I'm also looking at overiding
| the cline system prompt to both make it fit my architecture
| better and also to remove stuff I don't need.
|
| I jokingly imagine in the future we won't get asked how long a
| new feature will take, rather, how many tokens will it take.
| thomasfromcdnjs wrote:
| Love the token joke!
| assimpleaspossi wrote:
| Since all these AI products just put together things they pull
| from elsewhere, I'm wondering if, eventually, there could be
| legal issues involving software products put together using such
| things.
| sublimefire wrote:
| I've been doing that for a while as well and mostly agree.
| Although one thing that I find useful is to build the local
| infrastructure to be able to collect useful prompts and the
| ability to work with files and urls. Web interface is limiting
| alone.
|
| I like gptresearcher and all of the glue put in place to be able
| to extend prompts and agents etc. Not to mention the ability to
| fetch resources from the web and do research type summaries on
| it.
|
| All in all it reminds me the work of security researchers,
| pentesters and analysts. Throughout the career they would build a
| set of tools and scripts to solve various problems. LLMs kind of
| force the devs to create/select tools for themselves to ease the
| burden of their specific line of work as well. You could work
| without LLMs but maybe it will be a bit more difficult to stand
| out in the future.
| denvermullets wrote:
| this is almost exactly how ive been using llms. i dont like the
| code complete in the ide, personally, and prefer all llm usage to
| be narrow specific blocks of code. it helps as i bounce between a
| lot of side projects, projects at work, and freelance projects.
| not to mention with context switching it really helps keep things
| moving, imo
| owebmaster wrote:
| I thought his project, sketch.dev is of very poor quality. I
| wouldn't ship something like this - the auth process is awful and
| broke, I still can't login. If after 14 hours of the post the
| service is still rugged to death, it also means the scalability
| of the app is bad. If we are going to use LLMs to replace hours
| of programming, we should aim for quality too.
| lm28469 wrote:
| It's really bad, much less useful than even the first public
| version of chatgpt. Even once you manage to log in, most of the
| time it doesn't even give something that compiles, it calls
| functions/variables which don't exist. The first line of the
| main had 2 errors...
| cratermoon wrote:
| But the question must be asked: At what cost?
|
| Are the results a paradigm shift so much better that it's worth
| the hundreds of billions sunk into the hardware and data centers?
| Is spicy autocomplete worth the equivalent of flying from New
| York to London while guzzling thousands of liters of water?
|
| It might work, for some definition of useful, but what happens
| when the AI companies try to claw back some of that half a
| trillion dollars they burnt?
| ryanobjc wrote:
| That's why open research (which "open" ai has never really
| contributed to!) and foundational models that everyone can
| contribute to are essential.
|
| This stuff is a pretty neat magical evolution and it should not
| be the domain of any single company.
|
| Also a lot of the hardware and so on has/is being paid for. AWS
| gcloud, etc aren't taking massive losses on their H100 and
| other compute services. This bubble is no different than any
| prior bubble ultimately, and bankruptcy will recycle useful
| assets into new companies and new purposes.
|
| Which btw why the US is still a huge winner and will continue
| to be -> robust and functioning bankruptcy laws and courts.
| nunez wrote:
| I definitely respect David's opinion given his caliber, but
| pieces like this make me feel strange that I just don't have a
| burning desire to use them.
|
| Like, yesterday I made some light changes to a containerized VPN
| proxy that I maintain. My first thought wasn't "how would Claude
| do this?" Same thing with an API I made a few weeks ago that
| scrapes a flight data website to summarize flights in JSON form.
|
| I knew I would need to write some boilerplate and that I'd have
| to visit SO for some stuff, but asking Claude or o1 to write the
| tests or boilerplate for me wasn't something I wanted or needed
| to do. I guess it makes me slower, sure, but I actually enjoy the
| process of making the software end to end.
|
| Then again, I do all of my programming on Vim and, technically,
| writing software isn't my day job (I'm in pre-sales, so, best
| case, I'm writing POC stuff). Perhaps I'd feel differently if I
| were doing this day in, day out. (Interestingly, I feel the same
| way about AI in this sense that I do about VSCode. I've used it;
| I know what's it capable of; I have no interest in it at all.)
|
| The closest I got to "I'll use LLMs for something real" was using
| it in my backend app that tracks all of my expenses to parse
| pictures of receipts. Theoretically, this will save me 30 seconds
| per scan, as I won't need to add all of the transaction metadata
| myself. Realistically, this would (a) make my review process
| slower, as LLMs are not yet capable of saying "I'm not sure" and
| I'd have to manually check each transaction at review time, (b)
| make my submit API endpoint slower since it takes relatively-
| forever for it to analyze images (or at least it did when I
| experimented with this on GPT4-turbo last year), and (c) drive my
| costs way up (this service costs almost nothing to run, as I run
| it within Lambda's free tier limit).
| uludag wrote:
| I think there's a big selection bias on hackernews that you
| wouldn't get elsewhere. There's still "elite" software
| developers I see who really aren't into the whole LLM tooling
| space. I found use in the autocomplete and search workflows
| that the author mentioned but I stopped using these tools, out
| of curiosity for things were before. It turns out I don't need
| it to be productive and I too probably enjoy working more
| without it.
| ge96 wrote:
| I'm an avg dev, I was never into LLMs/co-pilot etc mocking
| prompt engineering but... my current job is working with an LLM
| framework so idk... future proofs me I guess. I do like
| computer vision and ML on dataset eg. training hand writing IMU
| by gestures that's cool.
|
| The embeddings I feel like there is something there even if it
| doesn't actually understand. My journey has just begun.
|
| I scoff every time someone says "this + AI". AI is this thing
| they just throw in there. Last time I didn't want to work with
| some tech I quit my job was not a good move not being
| financially independent. Anyway yeah I'll keep digging into
| this. I still don't use co-pilot right now but I'm reading up
| more on the embedding stuff for cross training or some case
| like RAG.
| 999900000999 wrote:
| I still find most LLMS to be extremely poor programmers .
|
| Claude will often generate tons and tons of useless code quickly
| using up it's limit. I often find myself yelling at it to stop.
|
| I was just working with it last night.
|
| "Hi Claude, can you add tabs here.": <div>
|
| <MainContent/>
|
| <div/>
|
| Claude will then start generating MainContent.
|
| DeepSeek, despite being free does a much better job than Claude.
| I don't know if it's smarter, but whatever internal logic it has
| is much more to the point.
|
| Claude also has a very weird bias towards a handful of UI
| libraries that has installed, even if those wouldn't be good for
| your project. I wasted hours on shancn UI which requires a very
| particular setup to work.
|
| LLM's are generally great at common tasks using a top 5(
| popularity) language.
|
| Ask it to do something in a Haxe UI library and it'll make up
| functions that *look* correct.
|
| Overall I like them, they definitely speed things up. I don't
| think most experienced software engineers have much to worry
| about for now. But I am really worried about juniors. Why higher
| a junior engineer, when you can just tell your seniors they need
| to use Copilot to crank out more code
| joseda-hg wrote:
| Assuming I know roughly what it will generate, I usually
| prepend my chats with previsions against this kind of thing
|
| "Add tabs here, assume the rest of the page will work with no
| futher modification, limit your changes so that any existing
| code keeps working"
|
| I also do stuff like "Project is using {X} libraries, keep
| dependencies minimal
|
| Generate a method takes {Z} parameters, return {Y}, using {A},
| {B} and {C} do {thing}"
|
| I'll add stuff like Language version, frameworks or specific
| requests based on this, but then I just reuse the setup , So I
| like to keep the first message with as much context as
| possible, ideally separating project context from specific
| request
| btbuildem wrote:
| The search part really resonates with me. I do a lot of
| odd/unusual/one-off things for my side projects, and I use LLMs
| extensively in helping me find a path forward. It's like an
| infinitely patient, all-knowing expert that pulls together info
| from any and all domain. Sometimes it will have answers that I am
| unable to find another way (eg, what's the difference between
| "busy s..." and "busy p..." AT command response on the esp8285?).
| It saves me hours of struggle, and I would not want to go back to
| the old ways.
| fassssst wrote:
| They're pretty great for printf debugging. Yesterday I was
| confounded by a bug so I rapidly added a ton of logging that the
| LLM wrote instantly, then I had the LLM analyze the state
| difference between the repro and non repro logs. It found
| something instantly that it would have taken me a few hours to
| find, which led me to a fix.
| hansvm wrote:
| That quartile reservoir sampler example is ... intriguing?
|
| My experience with LLM code is that it can't come up with
| anything even remotely novel. If I say "make it run in amortized
| O(1)" then 99 times out of 100 I'll get a solution so wildly
| incorrect (but confidently asserting its own correctness) that it
| can't possibly be reshaped into something reasonable without a
| re-write. The remaining 1/100 times aren't usually "good" either.
|
| For the reservoir sampler -- here, it did do the job. David
| almost certainly knows enough to know the limits of that code and
| is happy with its limitations. I've solved that particular
| problem at $WORK though (reservoir sampling for percentile
| estimates), and for the life of me I can't find a single LLM
| prompt or sequence of prompts that comes anywhere close to
| optimality unless that prompt also includes the sorts of insights
| which lead to an amortized O(1) algorithm being possible (and,
| even then, you still have to re-run the query many times to get a
| useful response).
|
| Picking on the article's solution a bit, why on earth is `sorted`
| appearing in the quantile estimation phase? That's fine if you're
| only using the data structure once (init -> finalize), but it's
| uselessly slow otherwise, even ignoring splay trees or anything
| else you could use to speed up the final inference further.
|
| I personally find LLMs helpful for development when either (1)
| you can tolerate those sorts of mishaps (e.g., I just want to run
| a certain algorithm through Scala and don't really care how slow
| it is if I can run it once and hexedit the output), or (2) you
| can supply all the auxilliary information so that the LLM has a
| decent chance of doing it right -- once you've solved the hard
| problems, the LLM can often get the boilerplate correct when
| framing and encapsulating your ideas.
| LouisSayers wrote:
| The use of LLMs reminds me a bit of how people use search
| engines.
|
| Some years ago I gave a task to some of my younger (but
| intelligent) coworkers.
|
| They spent about 50 minutes searching in google and came back to
| me saying they couldn't find what they were looking for.
|
| I then typed in a query, clicked one of the first search results
| and BAM! - there was the information they were unable to find.
|
| What was the difference? It was the keywords / phrases we were
| using.
| highfrequency wrote:
| > _A lot of the value I personally get out of chat-driven
| programming is I reach a point in the day when I know what needs
| to be written, I can describe it, but I don't have the energy to
| create a new file, start typing, then start looking up the
| libraries I need... LLMs perform that service for me in
| programming. They give me a first draft, with some good ideas,
| with several of the dependencies I need, and often some mistakes.
| Often, I find fixing those mistakes is a lot easier than starting
| from scratch._
|
| This to me is the biggest advantage of LLMs. They dramatically
| reduce the activation energy of _doing something you are
| unfamiliar with_. Much in the way that you 're a lot more likely
| to try kitesurfing if you are at the beach standing next to a
| kitesurfing instructor.
|
| While LLMs may not yet have human-level _depth_ , it's clear that
| they already have vastly superhuman _breadth_. You can argue
| about the current level of expertise (does it have undergrad
| knowledge in every field? PhD level knowledge in every field?)
| but you can 't argue about the breadth of fields, nor that the
| level of expertise improves every year.
|
| My guess is that the programmers who find LLMs useful are people
| who do a lot of different _kinds_ of programming every week (and
| thus are constantly going from incompetent to competent in things
| that other people already know), rather than domain experts who
| do the same kind of narrow and specialized work every day.
| otteromkram wrote:
| I think your biggest takeaway should be that they person
| writing the blog post is extremely well-known versed in
| programming and has labored over code for hours, along with
| writing tests, debugging, etc. He knows what he would like
| because it's second nature. He was able to get the best from
| the LLM because his vision of what the code should look like
| helped craft a solid prompt.
|
| Newer people into programming might not have as good of a time
| because they may skip actually learning something fundamentals
| and rely on LLMs as a crutch. Nothing wrong with that, I
| suppose, but there might be at some point when everything goes
| up in smoke and the LLM is out of answers.
|
| No amount of _italic font_ is going to change that.
| highfrequency wrote:
| My experience is opposite - I get the most value out of LLMs
| for topics that I have less expertise in. It's become vastly
| easier up to speed in a new field because you can immediately
| answer basic questions, have the holes in your understanding
| pointed out, and be directed to the concepts you are missing.
| charlieyu1 wrote:
| I'm a hobby programmer who never worked a programming job. Last
| week I was bored, I asked o1 to help me to write a Solitaire card
| game using React because I'm very rusty with web development.
|
| The first few steps were great. Guided me to install things and
| setup a project structure. The model even generated codes for a
| few files.
|
| Then something went wrong, the model kept telling me what to do
| in vague, but didn't output codes anymore. So I asked for further
| help, and now it started contradicting itself, rewriting business
| logic that were implemented in the first response, 3-4 pieces of
| code snippets of the same file that aren't compatible etc, and it
| all fell apart.
| jarsin wrote:
| My first program ever was a windows calculator. My roomates
| would sit down and find bugs after I thought I perfected it. I
| learned so much spending weeks trying to get that damn thing
| working.
|
| I'm not too optimistic about the future of software development
| if juniors are turning to AI to do those early projects for
| them.
| mocamoca wrote:
| LLMs contexts are fast to overload, as the article states.
| That's why he writes smaller, specific packages, one at a time,
| and uses a web UI instead of something like cursor.
|
| I had the same issue as you a few days ago. By separating the
| problem in smaller parts and addressing each parts one by one
| it got easier.
|
| In your specific case I would try to fully complete the
| business logic one side. Reset the context. Then provide the
| logic to a new context and ask for an interface. Difficulty
| will arise when discovering that the logic is wrong or not
| suited to the UI, but i would keep using the same process to
| edit the code. Maybe two different contexts, one for logic, one
| for UI?
|
| How did you do?
| cpursley wrote:
| Yeah, you wanna use Claude for code. That's the problem. Try
| Cursor or Bolt.
| aerhardt wrote:
| His experience mirrors mine. I'm happy he explicitly mentions
| search, when people have been shouting "this is not meant for
| search" for a couple years now. Of course it helps with search. I
| also love the tech for producing first drafts, and it greatly
| lowers the energy and cognitive load when attacking new tasks,
| like others are repeating on this thread.
|
| I think at the same time, while the author says this is the
| second most impressive technology he's seen in his lifetime, it's
| still a far cry from the bombastic claims being made by the
| titans of industry regarding its potential. Not uncommon to see
| claims here on HN of 10x improvements in productivity, or teams
| of dozens of people being axed, but nothing in the article or in
| my experience lines up with that.
| jordanmorgan10 wrote:
| The more experienced the engineer the less CSS is on the page.
| This seems to be a universal truth, I want to learn from these
| people - but my goodness, but could we at least use margins to
| center content.
| dboreham wrote:
| Interesting that he had the same thought initially as I did
| (after running a model myself on my own hardware) : this is like
| the first time I ran a traceroute across the planet.
| ryanobjc wrote:
| I have been getting more value out of LLMs recently, and the
| great irony is it is because of a few different packages in emacs
| and the wonderful CLI LLM chat programming tool 'aider'.
|
| My workflow puts LLM chat at my fingertips, and I can control the
| context. Pretty much any text in emacs can be sent to a LLM of
| your choice via API.
|
| Aider is even better, it does a bunch of tricks to improve
| performance, and is rapidly becoming a 'must have' benchmark for
| LLM coding. It integrates with git so each chat modification
| becomes a new git commit. Easy to undo changes, redo changes,
| etc. It also has a bunch of hacks because while o1 is good as
| reasoning, it (apparently) doesn't do code modification well.
| Aider will send different types of requests to different
| 'strengths' of LLMs etc. Although if you can use sonnet, you can
| just use that and be done with it.
|
| It's pretty good, but ultimately it's still just a tool for
| transforming words into code. It won't help you think or
| understand.
|
| I feel bad for new kids who won't develop muscle and sight
| strength to read/write code. Because you still need to read/write
| code, and can't rely on the chat interface for everything.
| Balgair wrote:
| I'm not a 'programmer'. At best, I'm a hacker, _at best_. I don
| 't work in a team. All my code is mostly one time usage to just
| get some little thing done, sometimes a bit of personal stuff
| too. I mostly use Excel anyways, and then python, and even then,
| I hate python because half the time I'm just dealing with library
| issues (not a joke, I measured it (and, no, I'm not learning
| another language, but thank you)). I'm in biotech, a very non
| code-y section of it too.
|
| LLMs are just a life saver. Literally.
|
| They take my code time down from weeks to an afternoon, sometimes
| less. Any they're _kind_.
|
| I'm trying to write a baseball simulator on my own, as a stretch
| goal. I'm writing my own functions now, a step up for me. The
| code is to take in real stats, do Monte Carlo, get results. Basic
| stuff. Such a task was _impossible_ for me before LLMs. I 've
| tried it a few times. No go. Now with LLMs, I've got the skeleton
| working and should be good to go before opening day. I'm hoping
| that I can use it for some novels that I am writing to get more
| realistic stats (don't ask).
|
| I know a lot of HN is very dismissive of LLMs as code help. But
| to me, a non programmer, they've opened it up. I can do things I
| never imagined that I could. Is it prod ready? Hell no, please
| God no. But is it good enough for me to putz with and get _just_
| working? Absolutely.
|
| I've downloaded a bunch of free ones from huggingface and Meta
| just to be sure they can't take them away from me. I'm _never_
| going back to that frustration, that 'Why can't I just be not so
| stupid?', that self-hating, that darkness. They have liberated
| me.
| averus wrote:
| I think the author is really on the right path with his vision
| for LLMs as tool for software development. Last week I tried
| probably all of them with something like a code challenge.
|
| I have to say that I am impressed with sketch.dev, it got me a
| working example from the first try and it looked cleaner form all
| the others, similar but cleaner somehow in terms of styling.
|
| The whole time I was using those tools I was thinking that I want
| exactly this a LLM trained specifically on the Go official
| documentation, or whatever your favourite language is, ideally
| fined tuned by the maintainers of the language.
|
| I want the LLM to show me an idiomatic way to write an API using
| the standard library I don't necessarily want it to do it instead
| of me, or to be trained on all of the scrapped data they could
| scrape. Show me a couple of examples maybe explain a concept,
| give me steps by step guidance.
|
| I also share his frustrations with the chat based approach what
| annoys me personally the most is the anthropomorphization of the
| LLMs, yesterday Gemini was even patronizing me...
| theptip wrote:
| This lines up well with my experience. I've tried coming at
| things from the IDE and chat side, and I think we need to merge
| tooling more to find the sweet spot. Claude is amazing at
| building small SPAs, and then you hit the context window cutoff
| and can't do anything except copy your file out. I suspect IDEs
| will figure this out before Claude/ChatGPT learn to be good
| enough at the things folks need from IDEs. But long-term, i
| suppose you don't want to have to drop down to code at all and so
| the constraints of chat might force the exploration of the new
| paradigm more aggressively.
|
| Hot take of the day, I think making tests and refactors easier is
| going to be revolutionary for code quality.
___________________________________________________________________
(page generated 2025-01-08 23:02 UTC)