[HN Gopher] How I Use "AI"
___________________________________________________________________
How I Use "AI"
Author : npalli
Score : 237 points
Date : 2024-08-04 00:37 UTC (22 hours ago)
(HTM) web link (nicholas.carlini.com)
(TXT) w3m dump (nicholas.carlini.com)
| droopyEyelids wrote:
| The biggest AI skeptics i know are devops/infrastructure
| engineers.
|
| At this point i believe most of them can not be convinced that
| LLMs are valuable or useful by any sort of evidence, but if
| anything could do it, this article could. Well done.
| m_ke wrote:
| And the funny part is, these LLMs are amazing at writing YAML
| config files.
|
| I always just let it write my first draft of docker, k8s and
| terraform configs.
| itgoon wrote:
| I'm a DevOps/infrastructure person, and I agree completely.
| This article won't change that.
|
| They've been great for helping me with work-related tasks. It's
| like having a knowledgeable co-worker with infinite patience,
| and nothing better to do. Neither the people nor the LLM give
| back perfect answers every time, but it's usually more than
| enough to get me to the next step.
|
| That said, having good domain knowledge helps a lot. You make
| fewer mistakes, and you ask better questions.
|
| When I use LLMs for tasks I don't know much about, it takes me
| a lot longer than someone who does know. I think a lot of
| people - not just infrastructure people - are missing out by
| not learning how to use LLMs effectively.
| dijksterhuis wrote:
| There's a good reason for the scepticism.
|
| Ops engineers [0] are the ones who have to spend weekends
| fixing production systems when the development team has snuck
| in "new fangled tools X, Y and Z" into a "bugfix" release.
|
| We have been burned by "new fangled" too many times. We prefer
| "old reliable" until "new fangled" becomes "fine, yes, we
| probably should".
|
| [0]: DevOps has now become a corporate marketing term with no
| actual relevance to the original DevOps methodology
| ado__dev wrote:
| This perfectly echoes my experience with AI.
|
| It's not perfect, but AI for working with code has been an
| absolute game changer for me.
| voiper1 wrote:
| If you use it as an intern, as a creative partner, as a rubber-
| duck-plus, in an iterative fashion, give it all the context you
| have and your constraints and what you want... it's fantastic.
| Often I'll take pieces from it, if it's simple enough I can just
| use it's output.
| voiper1 wrote:
| >I'm also a security researcher. My day-to-day job for nearly the
| last decade now has been to show all of the ways in which AI
| models fail spectacularly when confronted with any kind of
| environment they were not trained to handle.
|
| > ... And yet, here I am, saying that I think current large
| language models have provided the single largest improvement to
| my productivity since the internet was created.
|
| >In the same way I wouldn't write off humans as being utterly
| useless because we can't divide 64 bit integers in our head---a
| task completely trivial for a computer---I don't think it makes
| sense to write off LLMs because you can construct a task they
| can't solve. Obviously that's easy---the question is can you find
| tasks where they provide value?
| zombiwoof wrote:
| So coding
| zombiwoof wrote:
| Burn down the rain forests so researchers can save time writing
| code
| rfw300 wrote:
| If you're concerned about the environment, that is a trade you
| should take every time. AI is 100-1000x more carbon-efficient
| at writing (prose or code) than a human doing the same task.
| https://www.nature.com/articles/s41598-024-54271-x
| isotypic wrote:
| The way this paper computes the emissions of a human seems
| very suspect.
|
| > For instance, the emission footprint of a US resident is
| approximately 15 metric tons CO2e per year [22], which
| translates to roughly 1.7 kg CO2e per hour. Assuming that a
| person's emissions while writing are consistent with their
| overall annual impact, we estimate that the carbon footprint
| for a US resident producing a page of text (250 words) is
| approximately 1400 g CO2e.
|
| Averaging this makes no sense. I would imagine driving a car
| is going to cause more emissions than typing on a laptop. And
| if we are comparing "emissions from AI writing text" to
| "emissions from humans writing text" we cannot be mixing the
| the latter with a much more emissions causing activity and
| still have a fair comparison.
|
| But that's besides the point, since it seems that the number
| being used by the authors isn't even personal emissions --
| looking at the source [22], the 15 metric tons CO2e per year
| is labeled as "Per capita CO2 emissions; Carbon dioxide (CO2)
| emissions from fossil fuels and industry. Land-use change is
| not included."
|
| This isn't personal emissions! This is emissions from the
| entire industrial sector of the USA divided by population. No
| wonder why AI is supposedly "100-1000x" more efficient.
| Counting this against the human makes no sense since these
| emissions are completely unrelated to the writing task the
| person is doing, its simply the fact they are a person living
| in the world.
| cheschire wrote:
| This was based on the training of GPT-3. They mention GPT-4
| only in the context of the AI they used to facilitate writing
| the paper itself.
|
| I'm not sure the scale of 2024 models and usage was
| influential in that paper at all.
| surfingdino wrote:
| If you eliminate humans, who will need AI?
| myaccountonhn wrote:
| I think the author does a decent job laying out good ways of
| using the LLMs. If you're gonna use them, this is probably the
| way.
|
| But he acknowledges the ethical social issues (also misses the
| environmental issues https://disconnect.blog/generative-ai-is-a-
| climate-disaster/) and then continues to use them anyway. For me
| the ickiness factor is too much, the benefit isn't worth it.
| j45 wrote:
| Efficency in models and specialized hardware just for the
| computation will likely level things out.
|
| Compute Power per watt might be different using say something
| on a large scale Apple Silicon, compared to the cards.
| myaccountonhn wrote:
| Very often that just increased efficiency just lead to
| increased demand. I'm skeptical.
| j45 wrote:
| You're welcome to be skeptical.
|
| If it's ok I'd like to both share how I'm navigating my
| skepticism and also being mindful of the need to keep in
| perspective other people's skepticism if it doesn't offer
| anything to compare.
|
| Why? I have friends who can border on veiled cynicism
| without outlining what might be in the consideration of
| skepticism. The only things being looked at are why
| something is not possible, not a combination. Both can
| result in a similar outcome.
|
| Not having enough time to look into intent enough, it just
| invalidates the persons skepticism until they look into it
| more themselves. Otherwise used as a mechanism to try and
| trigger the world to expend mental labour for free on your
| behalf.
|
| It's important to ask one's self if there may be partially
| relevant facts to determine what kind of skepticism may
| apply:
|
| - Generally, is there a provenance of efficiency
| improvement both in the world of large scale software and
| algorithmic optimizations?
|
| - Have LLMs become more optimized in the past year or two?
| (Can someone M1 Max Studio run more and more models that
| are smaller and better to do the same)
|
| - Generally and historically is there provenance in compute
| hardware optimizations, for LLm type or LLM calculations
| outright?
|
| - Are LLMs using a great deal more resources on average
| than new technologies preceding it?
|
| - Are LLMs using a massive amount of resources in the start
| similar to servers that used to take up entire rooms
| compared to today?
| AndyNemmity wrote:
| In a just society where private corporations didn't attempt to
| own everything in existence, there are no ethical social issues
| in my mind.
|
| LLMs just use the commons, and should only be able to be owned
| by everyone in society.
|
| The problem comes in with unaccountable private totalitarian
| institutions. But that doesn't mean the technology is an
| ethical social issue, it's the corporations who try to own
| common things like the means of production that is the problem.
|
| Yes, there's the pragmatic view of the society we live in, and
| the issues that it contains, but that's the ethical issue that
| we need to address.
|
| Not that we can as a society create LLMs based on the work of
| society.
| RodgerTheGreat wrote:
| LLMs do not simply _use_ the commons, they are a vehicle for
| _polluting_ the commons on an industrial scale. If,
| hypothetically, the ethical problems with plagiarizing
| creative work to create these models were a non-issue, there
| would still be massive ethical problems with allowing their
| outputs to be re-incorporated into the web, drowning useful
| information in a haze of superficially plausible
| misinformation.
| visarga wrote:
| I don't think you are right. If you test LLM text and
| random internet text for inaccuracies and utility, you'd
| probably find more luck with LLM text.
|
| For example, if you use a LLM to summarize this whole
| debate, you would get a decent balanced report,
| incorporating many points of view. Many times the article
| generated from the chat thread is better than the original
| one. Certainly better grounded in the community of readers,
| debunks claims, represents many perspectives.
| (https://pastebin.com/raw/karBY0zD)
| RodgerTheGreat wrote:
| I am not going to fact-check your sludge for you.
| dijksterhuis wrote:
| I just want to emphasise two things, which are both mentioned in
| the article, but I still want to emphasise them as they are core
| to what I take from the article as someone who has been a fan boy
| of Nicholas for years now
|
| 1. Nicholas really does know how badly machine learning models
| can be made to screw up. Like, he _really_ does. [0]
|
| 2. This is how _Nicholas_ -- an academic researcher in the field
| of security of machine learning -- uses LLMs to be more
| efficient.
|
| I don't know whether Nicolas works on globally scaled production
| systems with have specific security/data/whatever controls that
| need to be adhered to, or whether he even touches any proprietary
| code. But seeing as he heavily emphasised the "i'm a researcher
| doing research things" in the article -- I'd take a heavy bet
| that he does not. And academic / research / proof-of-concept
| coding has different limitations/context/needs than other areas.
|
| I think this is a really great write up, even as someone on the
| anti-LLM side of the argument. I really appreciate the attempt to
| do a "middle of the road" post which is absolutely what the
| conversation needs right now (pay close attention to how this was
| written LLM hypers).
|
| I don't share his experience, I still value and take enjoyment
| from the "digging for information" process -- it is how I learn
| new things. Having something give me the answer doesn't help me
| learn, and writing new software is a learning process for me.
|
| I did take a pause and digested the food for thought here. I
| still won't be using an LLM tomorrow. I am looking forward to his
| next post, which sounds very interesting.
|
| [0]: https://nicholas.carlini.com/papers
| tptacek wrote:
| Nicholas worked at Matasano, and is responsible for most of the
| coolest levels in Microcorruption.
| dijksterhuis wrote:
| He also worked at Google. I don't think that negates my point
| as he was still doing research there :shrugs:
|
| > academic / research / proof-of-concept coding has different
| limitations/context/needs than other areas.
| tptacek wrote:
| No idea. Just saying, on security stuff, he's legit.
| antognini wrote:
| He's also a past winner of the International Obfuscated C Code
| Contest: https://www.ioccc.org/2020/carlini/index.html
| fumeux_fume wrote:
| It is overhyped. If you don't know much about what you're trying
| to do, then you're not going to know how bad or suboptimal the
| the LLM's output is. Some people will say it doesn't matter as
| long as it gets the job done. Then they end up paying a lot extra
| for me to come in and fix it when it's going haywire in prod.
| simonw wrote:
| This article is about how someone who DOES know a lot about
| what they're trying to do can get huge value out if them,
| despite their frequent mistakes.
| 7speter wrote:
| And if you don't know a lot, you should at least know that an
| LLM/chatbot is useful as far as giving you a bit of an
| immersive experience into a topic, and that you should use
| other resources to verify what the LLM/chatbot is telling
| you.
| Kiro wrote:
| You couldn't have picked a worse article to post that comment
| on.
| vasili111 wrote:
| If I know technology which I am using llm for then llm helps me
| to do it faster. If I am not familiar with technology then llm
| helps me to learn it faster by showing me win the code that it
| generates which part of technology is important and how it works
| in real examples. But I do not think it is helpful and I would
| say it may be dangerous depending on task you do if you do not
| know technology and also do not what to learn it and understand
| how generated code works.
| alwinaugustin wrote:
| I also use LLMs similarly. As a professional programmer, LLMs
| save me a lot of time. They are especially efficient when I don't
| understand a flow or need to transform values from one format to
| another. However, I don't currently use them to produce code that
| goes into production. I believe that in the coming years, LLMs
| will evolve to analyze complete requirements, architecture, and
| workflow and produce high-quality code. For now, using LLMs to
| write production-ready applications in real-time scenarios will
| take longer.
| cdrini wrote:
| I've been pleasantly surprised by GitHub's "copilot workspace"
| feature for creating near production code. It takes a GitHub
| issue, converts it to a specification, then to a list of
| proposed edits to a set of files, then it makes the edits. I
| tried it for the first time a few days ago and was pleasantly
| surprised at how well it did. I'm going to keep experimenting
| with it more/pushing it to see how well it works next week.
|
| GitHub's blog post: https://github.blog/news-insights/product-
| news/github-copilo...
|
| My first experience with it:
| https://youtube.com/watch?v=TONH_vqieYc
| tptacek wrote:
| There's a running theme in here of programming problems LLMs
| solve where it's actually not that important that the LLM is
| perfectly correct. I've been using GPT4 for the past couple
| months to comprehend Linux kernel code; it's _spooky_ good at it.
|
| I'm a C programmer, so I can with some effort gradually work my
| way through random Linux kernel things. But what I can do now
| instead is take a random function, ask GPT4 what it does and what
| subsystem it belongs to, and then ask GPT4 to write me a dummy C
| program that exercises that subsystem (I've taken to asking it to
| rewrite kernel code in Python, just because it's more concise and
| easy to read).
|
| I don't worry at all about GPT4 hallucinating stuff (I'm sure
| it's doing that all the time!), because I'm just using its output
| as Cliff's Notes for the actual kernel code; GPT4 isn't the
| "source of truth" in this situation.
| atum47 wrote:
| I've been using it for all kinds of stuff. I was using a drying
| machine at a hotel a while ago and I was not sure about the
| icon that it was display on the visor regarding my clothes, so
| I asked gpt and it told me correctly. It read all the manuals
| and documentations from pretty much everything right? Better
| then Google it and you just ask for the exact thing you want.
| tkgally wrote:
| I used LLMs for something similar recently. I have some old
| microphones that I've been using with a USB audio interface I
| bought twenty years ago. The interface stopped working and I
| needed to buy a new one, but I didn't know what the three-
| pronged terminals on the microphone cords were called or
| whether they could be connected to today's devices. So I took
| a photo of the terminals and explained my problem to ChatGPT
| and Claude, and they were able to identify the plug and tell
| me what kinds of interfaces would work with them. I ordered
| one online and, yes, it worked with my microphones perfectly.
| 7speter wrote:
| My washing machine went out because some flooding and I gave
| chatGPT all of the diagnostic codes and it concluded that it
| was probably a short in my lid lock.
|
| The lid lock came a few days later, I put it in, and I'm able
| to wash laundry again.
| viraptor wrote:
| The two best classes for me are definitely:
|
| - "things trivial to verify", so it doesn't matter if the
| answer is not correct - I can iterate/retry if needed and
| fallback to writing things myself, or
|
| - "ideas generator", on the brainstorming level - maybe it's
| not correct, but I just want a kickstart with some directions
| for actual research/learning
|
| Expecting perfect/correct results is going to lead to failure
| at this point, but it doesn't prevent usefulness.
| tptacek wrote:
| Right, and it only needs to be right often enough that taking
| the time to ask it is positive EV. In practice, with the
| Linux kernel, it's more or less consistently right (I've
| noticed it's less right about other big open source
| codebases, which checks out, because there's a _huge_ written
| record of kernel development for it to draw on).
| seanhunter wrote:
| Exactly. It's similar in other (non programming) fields - if
| you treat it as a "smart friend" it can be very helpful but
| relying on everything it says to be correct is a mistake.
|
| For example, I was looking at a differential equation recently
| and saw some unfamiliar notation[1] (Newton's dot notation). So
| I asked claude for why people use Newton's notation vs
| Lagrange's notation. It gave me an excellent explanation with
| tons of detail, which was really helpful. Except in every place
| it gave me an example of "Lagrange" notation it was actually in
| Leibniz notation.
|
| So it was super helpful and it didn't matter that it made this
| specific error because I knew what it was getting at and I was
| treating it as a "smart friend" who was able to explain
| something specific to me. I would have a problem if I was using
| it somewhere where the absolute accuracy was critical because
| it made such a huge mistake throughout its explanation.
|
| [1]
| https://en.wikipedia.org/wiki/Notation_for_differentiation#N...
| dang wrote:
| This is close to how I've been using them too. As a device for
| speeding up learning, they're incredible. Best of all, they're
| strongest where I'm weakest: finding all the arbitrary details
| that are needed for the question. That's the labor-intensive
| part of learning technical things.
|
| I don't need the answer to be correct because I'm going to do
| that part myself. What they do is make it an order of magnitude
| faster to get anything on the board. They're the ultimate prep
| cook.
|
| There are things to dislike and yes there is over-hype but
| "making learning less tedious" is huge!
| loufe wrote:
| You put words to what I've been thinking for a while. When
| I'm still new to some new technology it is a huge time-saver.
| I used to need to go bother some folks somewhere on a discord
| / Facebook group / matrix chat to get the one piece of
| context that I was hung up on. Sometimes it is hours or days
| to get that one nugget.
|
| I feel more interested in approaching challenging problems in
| fact because I know I can get over those frustrating phases
| much more easily and quickly.
| 7speter wrote:
| I came here to write essentially the same comment as you.
| Instead of going into a chatroom where people tell you
| you're lazy because you are unclear on ambiguous terms in
| documentation, these days I paste in portions of
| documentation and ask GPT for clarification on what I'm
| hazy about.
| vertis wrote:
| I'm finding myself using the extensively in the learning way,
| but also I'm an extreme generalist. I've learned so many
| languages over 23 years, but remembering the ones I don't use
| frequently is hard. The LLMs become the ultimate memory aid.
| I know that I can do something in a given language, and will
| recognised that it's correct when I see it.
|
| Together with increasingly powerful speech to text I find
| myself talking to the computer more and more.
|
| There are flaws, there are weaknesses, and a bubble, but any
| dev that can't find any benefit in LLMs is just not looking.
| Onawa wrote:
| Languages, syntax, flags, and the details... I too have
| touched so many different technologies over the years that
| I understand at a high level, but don't remember the
| minutiae of. I have almost turned into a "conductor" rather
| than an instrumentalist.
|
| Especially for debugging issues that could previously take
| days of searching documentation, Stack overflow, and
| obscure tech forums. I can now ask an LLM, and maybe 75% of
| the time I get the right answer. The other 25% of the time
| it still cuts down on debugging time by helping me try
| various fixes, or it at least points me in the right
| direction.
| smusamashah wrote:
| I use it like a dictionary (select text and lookup) and based
| on what I looked up and answer, I judge myself how correct
| the answers are, and they are on point usually.
|
| It has also made making small pure vanilla html/js based
| tools fun. It gives me a good enough prototype which I can
| mold to my needs. I have wrote a few very useful
| scripts/tools past few months which otherwise I would never
| even have started because of all the required first steps and
| basic learnings.
|
| (never thought I would see your comment as a user)
| ransom1538 wrote:
| gpt: give me working html example of javascript beforeunload
| event, and onblur, i want to see how they work when i minimize
| a tab.
|
| 10 seconds later, I am playing with these out.
| whatever1 wrote:
| What is very useful for me is when I conduct research outside of
| my field of expertise, I do not even know what keywords to look
| for. An LLM can help you with this.
| XMPPwocky wrote:
| Every now and then, I'll actually sort of believe an article like
| this. Then I go and test the current models on things like
| semantic search.
|
| For instance -
|
| The Hough transform detects patterns with certain structure in
| images, e.g. circles or lines.
|
| So I'm looking for academic research papers which apply the Hough
| transform to audio spectra, to recognize the harmonic structure
| of tonal audio and thus determine the fundamental pitch. (i.e.
| the Hough space would be a 1D space over fundamental frequency).
|
| I've spent probably 90 minutes, over the several times I've read
| an optimistic post like this, asking various LLMs (mostly GPT-4o,
| though my early tests predate GPT-4o, and I've also tried Gemini
| and Claude), prompts along the lines of
|
| > The Hough transform detects patterns with certain structure in
| images, e.g. circles or lines. > I'm looking for academic
| research papers (please _link them_ or provide a DOI.org link at
| least) which apply the Hough transform to audio spectra, to
| identify the harmonic structure of audio and thus determine the
| fundamental pitch. > Make sure to provide only papers that
| actually exist. If you can't find anything particularly relevant,
| say so as a disclaimer & just provide the most relevant papers
| you can.
|
| This is a reliable "fake paper generator", unfortunately - it'll
| just make up plausible garbage like
|
| > Here are some academic papers related to applying the Hough
| transform to audio spectra for identifying harmonic structures
| and fundamental pitch:
|
| > "An Audio Pitch Detection Algorithm Based on the Hough
| Transform" > Authors: Mark W. and John D. > Published In: IEEE
| Transactions on Audio, Speech, and Language Processing > DOI:
| 10.1109/TASL.2008.2000773 > Abstract: This paper proposes an
| audio pitch detection algorithm that utilizes the Hough transform
| to analyze the harmonic structure of audio spectra and determine
| the fundamental pitch.
|
| This paper does not exist. Complete waste of my time. And again,
| this behavior persists over the >1 year period I've been trying
| this query.
|
| And it's not just search-like tasks. I've tried asking for code
| and gotten stuff that's outright dangerous (try asking for code
| to do safe overflow-checked addition on int64_t in C- you have
| about an 80% chance of getting code that triggers UB in one way
| or another). I've asked for floating-point calling conventions on
| RISC-V for 32-bit vs 64-bit (would have been faster than going
| through the extension docs), and been told that RV64 has 64
| floating-point registers (hey, it's got a 64 in the name!). I've
| asked if Satya Nadella ever had COVID-19 and been told- after
| GPT-4o "searched the web"- that he got it in March of 2023.
|
| As far as I can tell, LLMs might conceivably be useful when all
| of the following conditions are true:
|
| 1. You don't really need the output to be good or correct, and 2.
| You don't have confidentiality concerns (sending data off to a
| cloud service), and, 3. You don't, yourself, want to learn
| anything or get hands-on - you want it done for you, and 4. You
| don't need the output to be in "your voice" (this is mostly for
| prose writing, for code this doesn't really matter); you're okay
| with the "LLM dialect" (it's crucial to delve!), and 5. The
| concerns about environmental impact and the ethics of the
| training set aren't a blocker for you.
|
| For me, pretty much everything I do professionally fails
| condition number 1 and 2, and anything I do for fun fails number
| 3. And so, despite a fair bit of effort on my part trying to make
| these tools work for me, they just haven't found a place in my
| toolset- before I even get to 4 or 5. Local LLMs, if you're able
| to get a beefy enough GPU to run them at usable speed, solve 2
| but make 1 even worse...
| SOLAR_FIELDS wrote:
| I've found that it really matters a lot how good the LLM is on
| how large the corpus it is that exists for its training. The
| simple example is that it's much better at Python than, say,
| Kotlin. Also, I also agree with sibling comment that in general
| the specific task of finding peer reviewed scientific papers it
| seems to be especially bad at for some reason.
| XMPPwocky wrote:
| I see no sibling comment here even with showdead on, but I
| could buy that (there's a lot of papers and only so many
| parameters, after all- but you'd think GPT-4o's search stuff
| would help, maybe a little better prompting could get it to
| at least validate its results itself? then again, maybe the
| search stuff is basically RAG and only happens one at the
| start of the query, etc etc)
|
| Regardless, yeah- I can definitely believe your point about
| corpus size. If I was doing, say, frontend dev with a stack
| that's been around a few years, or Linux kernel hacking as
| tptacek mentioned, I could plausibly imagine getting some
| value.
|
| One thing I _do_ do fairly often is binary reverse
| engineering work- there 's definitely things an LLM could
| probably help with here (for things like decompilation,
| though, I wonder whether a more graph-based network could
| perform better than a token-to-token transformer - but you'd
| have to account for the massive data & pretrain advantage of
| an existing LLM).
|
| So I've looked at things like Binary Ninja's Sidekick, but
| haven't found an opportunity to use them yet -
| confidentiality concerns rule out professional use, and when
| I reverse engineer stuff for fun ... I like doing it, I like
| solving the puzzle and slowly comprehending the logic of a
| mysterious binary! I'm not interested in using Sidekick off
| the clock for the same reason I like writing music and not
| just using Suno.
|
| One opportunity that might come up for Sidekick, at least for
| me, is CTFs- no confidentiality concerns, time pressure and
| maybe prizes on the line. We'll see.
| OkGoDoIt wrote:
| Yeah, I spent 6 months trying to find any value whatsoever
| out of GitHub copilot on C# development but it's barely
| useful. And then I started doing python development and it
| turns out it's amazing. It's all about the training set.
| rhdunn wrote:
| I've been using the JetBrains AI model assisted autocomplete
| in their IDEs, including for Kotlin. It works well for
| repetitive tasks I would have copy/paste/edited before, and
| faster, so I have become more productive there.
|
| I've not yet tried asking LLMs Kotlin-based questions, so
| don't know how good they are. I'm still exploring how to fit
| LLMs and other AI models into my workflow.
| dijksterhuis wrote:
| > 1. You don't really need the output to be good or correct
|
| > 2. You don't have confidentiality concerns (sending data off
| to a cloud service)
|
| At $PREVIOUS_COMPANY LLMs were straight up blanket banned for
| these reasons too. Confidentiality related to both the code and
| data for the customers.
|
| The possibility that "it might get some things right, some of
| the time" was nowhere near a good enough trade-off to override
| the confidentiality concerns.
|
| And we definitely did not have staff/resources to do things
| local only.
| brooksbp wrote:
| Also agree that asking for academic papers seems to increase
| the potential for hallucination. But, I don't know if I am
| prompting it the best way in these scenarios..
| cdrini wrote:
| The article goes through a few use cases where LLMs are
| especially good. Your examples are very different, and are the
| cases where they perform especially poorly.
|
| Asking a pure (ie no internet/search access) LLM for papers on
| a niche subject is doubling down on their weaknesses. That
| requires LLMs to have very high resolution specific knowledge,
| which they do not have. They have more coarse/abstract
| understanding from their training data, so things like paper
| titles, DOIs, etc are very unlikely to persist through training
| for niche papers.
|
| There are some LLMs that allow searching the internet; that
| would likely be your best bet for finding actual papers.
|
| As an experiment I tried your exact prompt in ChatGPT, which
| has the ability to search, and it did a search and surfaced
| real papers! Maybe your experiment was from before it had
| search access.
| https://chatgpt.com/share/a1ed8530-e46b-4122-8830-7f6b1e2b1c...
|
| I also tried approaching this problem with a different
| prompting technique that generally tends to yield better
| results for me:
| https://chatgpt.com/share/9ef7c2ff-7e2a-4f95-85b6-658bbb4e04...
|
| I can't really vouch how well these papers match what you're
| looking for since I'm not an expert on Hugh transforms (would
| love to know if they are better!). But my technique was: first
| ask it about Hugh transforms. This lets me (1) verify that
| we're on the same page, and (2) loads a bunch of useful terms
| into the context for the LLM. I then expand to the example of
| using Hugh transforms for audio, and again can verify that
| we're on the same page, and load even more terms. Now when I
| ask it to find papers, it had way more stuff loaded in context
| to help it come up with good search terms and hopefully find
| better papers.
|
| With regards to your criteria:
|
| 1. The code from an LLM should never be considered final but a
| starting point. So the correctness of the LLM's output isn't
| super relevant since you are going to be editing it to make it
| fully correct. It's only useful if this cleanup/correction is
| faster than writing everything from scratch, which depends on
| what you're doing. The article has great concrete examples of
| when it makes sense to use an LLM.
|
| 2. Yep , although asking questions/generating generic code
| would still be fine without confidentiality concerns. Local
| LLMs though do exist, but I personally haven't seen a good
| enough flow to adopt one.
|
| 3. Strong disagree on this one. I find LLMs especially useful
| when I am learning. They can teach me eg a new
| framework/library incredibly quickly, since I get to learn from
| my specific context. But I also tend to learn most quickly by
| example, so this matches my learning style really well. Or they
| can help me find the right terms/words to then Google.
|
| 4. +1 I'm not a huge fan of having an LLM write for me. I like
| it more as a thinking tool. Writing is my expression. It's a
| useful editor/brainstormer though.
|
| 5. +1
| fxj wrote:
| Just out of curiosity: Have you tried perplexity? When I paste
| your prompt it gives me a list of
|
| 2 researchgate papers (Overlapping sound event recognition
| using local spectrogram features with the Generalised Hough
| Transform July 2013 Pattern Recognition Letters)
|
| and one ieee publication (Generalized Hough Transform for
| Speech Pattern Classification, in IEEE/ACM Transactions on
| Audio, Speech, and Language Processing, vol. 23, no. 11, pp.
| 1963-1972, Nov. 2015)
|
| When I am looking for real web results chatgpt is not very
| good, but perplexity very often shines for me
|
| and for python programming have a look at withpretzel.com which
| does the job for me.
|
| just my 2 ct
| coolThingsFirst wrote:
| No need for programmers anymore
| simonw wrote:
| This piece effectively concluded the opposite of that.
| coolThingsFirst wrote:
| He used LLM to conclude that
| Flomlo wrote:
| LLM is the best human to computer interface I have ever seen.
|
| Together with voice to text through whisper for example we broke
| the UI barrier.
|
| It takes a little bit of time to rebuild our ecosystem but llms
| are game changer already.
|
| I'm waiting for a finetuned none fact knowing small LLM which
| knows everything it needs to know for this specific task.
|
| And I'm waiting until everything critical is rewritten so I can
| use one ai agent to control my bank, calendar, emails and stuff.
|
| Perhaps through banking read only account permissions or whatnot.
| tunnuz wrote:
| 100%
| banana_feather wrote:
| This just does not match my experience with these tools. I've
| been on board with the big idea expressed in the article at
| various points and tried to get into that work flow, but with
| each new generation of models they just do not do well enough,
| consistently enough, on serious tasks to be a time or effort
| saver. I don't know what world these apparently high output
| people live in where their days consist of porting Conway's Game
| of Life and writing shell scripts that only 'mostly' need to
| work, but I hope one day I can join them.
| AndyNemmity wrote:
| I use it daily, and it's a time and effort saver.
|
| And writing shell scripts that "mostly" work is what it does.
|
| I don't expect it to work. Just like I don't expect my own code
| to ever work.
|
| My stuff mostly works too. In either case I will be shaving
| yaks to sort out where it doesn't work.
|
| At a certain level of complexity, the whole house of cards does
| break down where LLMs get stuck in a loop.
|
| Then I will try using a different LLM to get it unstuck from
| the loop, which works well.
|
| You will have cases where both LLMs get stuck in a loop, and
| you're screwed. Okay.. well, now you're however far ahead you
| were at that stage.
|
| Essentially, some of us have spent more of our life fixing
| code, than we have writing it from scratch.
|
| At that level, it's much easier for me to fix code, than write
| it from scratch. That's the skill you're implementing with
| LLMs.
| kredd wrote:
| You get used to their quirks. I can more or less predict what
| Claude/GPT can do faster than me, so I exclusively use them for
| those scenarios. Implementing it to one's development routine
| isn't easy though, so I had to trial and error until it made me
| faster in certain aspects. I can see it being more useful for
| people who have a good chunk of experience with coding, since
| you can filter out useless suggestions much faster - ex. give a
| dump of code, description of a stupid bug, and ask it where the
| problem might be. If you generally know how things work, you
| can filter out the "definitely that's not the case"
| suggestions, it might route you to a definitive answer faster.
| ein0p wrote:
| Just today I had GPT4 implement a SwiftUI based UI for a
| prototype I'm working on. I was able to get it to work with
| minimal tweaks within 15 minutes even though I know next to
| nothing about SwiftUI (I'm mainly a systems person these days). I
| pay for this, and would, without hesitation, pay 10x for a larger
| model which does not require "minimal tweaks" for the bullshit
| tasks I have to do. Easily 80% of all programming consists of
| bullshit tasks that LLMs of 2024 are able to solve within seconds
| to minutes, whereas for me some of them would take half a day of
| RTFM. Worse, knowing that I'd have to RTFM I probably would avoid
| those tasks like the plague, limiting what can be accomplished.
| I'm also relieved somewhat that GPT4 cannot (yet?) help me with
| the non-bullshit parts of my work.
| throwaway290 wrote:
| If it handles 99% of your tasks (making a smart boss fire you),
| know that you helped train it for that by using it/paying for
| it/allowing it to be trained on code in violation of license.
|
| Even if 80% of programmer tasks in an org (or worldwide gig
| market) can be handled by ML, already 80% of programmers can be
| laid off .
|
| Maybe you have enough savings that you just don't need to work
| but some of us do!
| simonw wrote:
| There are two ways this could work out:
|
| - LLM-assistance helps solve 80% of programming tasks, so 80%
| of programmers lose their jobs
|
| - LLM-assistance provides that exact same productivity boost,
| and as a result individual programmers become FAR more
| valuable to companies - for the same salary you get a lot
| more useful work out of them. Companies that never considered
| hiring programmers - because they would need a team of 5 over
| a 6 month period to deliver a solution to their specific
| problem - now start hiring programmers. The market for custom
| software expands like never before.
|
| I expect what will actually happen will be somewhere between
| those two extremes, but my current hope is that it will still
| work out as an overall increase in demand for software
| talent.
|
| We should know for sure in 2-3 years time!
| throwaway290 wrote:
| I like your optimism, but in programming at least in US
| unemployment so far already rose higher than average
| unemployment overall.
|
| ML supercharges all disparity, business owners or
| superstars who made a nice career and name will earn more
| by commanding fleets of cheap (except energy) llms while
| their previous employees/reports get laid off by tens of
| thousands (ironically they do it to themseves by wecoming
| llms and thinking that the next guy will be the unlucky
| one, same reason unions don't work there I guess...)
|
| And to small businesses who never hired programmers before,
| companies like ClosedAI monetize our work for their bosses
| to get full products out of chatbots (for now buggy but
| give it a year). Those businesses will grow but when they
| hire they will get cheap minimal wage assistants who talk
| to llms. That's at best where most programmers are headed.
| The main winners will be whoever gets to provide ML that
| monetize stolen work (unless we stop them by collective
| outrage and copyright defense), so Microsoft
| simonw wrote:
| I'm not sure how much we can assign blame for US
| programming employment to LLMs. I think that's more due
| to a lot of companies going through a "correction" after
| over-hiring during Covid.
|
| As for "their bosses to get full products out of
| chatbots": my current thinking on that is that an
| experienced software engineer will be able to work faster
| with and get much higher quality results from working
| with LLMS than someone without any software experience.
| As such, it makes more sense economically for a company
| to employ a software engineer rather than try to get the
| same thing done worse and slower with cheaper existing
| staff.
|
| I hope I'm right about this!
| ein0p wrote:
| Thing is though, I work in this field. I do not see it
| handling the non-bullshit part of my job in my lifetime, the
| various crazy claims notwithstanding. For that it'd need
| cognition. Nobody has a foggiest clue how to do that.
| eterps wrote:
| If I knew _why_ something is [flagged] I could probably learn
| something from it.
| toomuchtodo wrote:
| There is no reason for folks to explain why they flag, but
| consider that if it was flagged but then remains available with
| the flag indicator (with the flags overridden), someone thought
| you might find value in it.
|
| I'm personally drawn to threads contentious enough to be
| flagged, but that have been vouched for by folks who have the
| vouch capability (mods and participants who haven't had vouch
| capability suspended). Good signal imho.
| cdrini wrote:
| Is there a way to discover flagged posts? How did you find
| this one?
|
| Also what's a "vouch" capability?
|
| Edit: answered my own question:
| https://github.com/minimaxir/hacker-news-
| undocumented/blob/m...
|
| I guess you can't vouch flagged? And it seems like there's a
| profile setting to show dead items?
| toomuchtodo wrote:
| https://github.com/vitoplantamura/HackerNewsRemovals?tab=re
| a... (Removal tracking)
|
| https://hackernewstitles.netlify.app/ (Title change
| tracking)
|
| Flag and vouch meta:
|
| https://news.ycombinator.com/item?id=39921649
|
| https://www.ycombinator.com/blog/two-hn-announcements/
|
| (to my knowledge, once a post has been sufficiently flagged
| without vouching, it is beyond a user's event horizon and
| only mods and users who had posted in the thread can see
| it)
| cdrini wrote:
| Wow awesome TIL! Thank you!
| aoeusnth1 wrote:
| There is https://hckrnews.com, which I find more useful
| than the basic homepage.
| jdhzzz wrote:
| "And that's where language models come in. Because most new-to-me
| frameworks/tools like Docker, or Flexbox, or React, aren't new to
| other people. There are probably tens to hundreds of thousands of
| people in the world who understand each of these things
| thoroughly. And so current language models do to. " Apparently
| not using it to proof-read or it would end with "too. "
| ghostpepper wrote:
| This mostly matches my experience but with one important caveat
| around using them to learn new subjects.
|
| When I'm diving into a wholly new subject for the first time, in
| a field totally unrelated to my field (similar to the author, C
| programming and security) for example biochemistry or philosophy
| or any field where I don't have even a basic grounding, I still
| worry about having subtly-wrong ideas about fundamentals being
| planted early-on in my learning.
|
| As a programmer I can immediately spot "is this code doing what I
| asked it to do" but there's no equivalent way to ask "is this
| introductory framing of an entire field / problem space the way
| an actual expert would frame it for a beginner" etc.
|
| At the end of the day we've just made the reddit hivemind more
| eloquent. There's clearly tons of value there but IMHO we still
| need to be cognizant of the places where bad info can be subtly
| damaging.
| simonw wrote:
| I don't worry about that much at all, because my experience of
| learning is that you inevitably have to reconsider the
| fundamentals pretty often as you go along.
|
| High school science is a great example: once you get to
| university you have to un-learn all sorts of things that you
| learned earlier because they were simplifications that no
| longer apply.
|
| Terry Pratchett has a great quote about this:
| https://simonwillison.net/2024/Jul/1/terry-pratchett/
|
| For fields that I'm completely new to, the thing I need most is
| a grounding in the rough shape and jargon of the field. LLMs
| are fantastic at that - it's then up to me to take that
| grounding and those jargon terms and start building my own
| accurate-as-possible mental model of how that field actually
| works.
|
| If you treat LLMs as just one unreliable source of information
| (like your well-read friend who's great at explaining things in
| terms that you understand but may not actually be a world
| expert on a subject) you can avoid many of the pitfalls. Where
| things go wrong is if you assume LLMs are a source of
| irrefutable knowledge.
| lolinder wrote:
| > like your well-read friend who's great at explaining things
| in terms that you understand but may not actually be a world
| expert on a subject
|
| I guess part of my problem with using them this way is that I
| _am_ that well-read friend.
|
| I know how the sausage is made, how easy it is to bluff a
| response to any given question, and for myself I tend to
| prefer reading original sources to ensure that the
| understanding that I'm conveying is as accurate as I can make
| it and not a third-hand account whose ultimate source is a
| dubious Reddit thread.
|
| > High school science is a great example: once you get to
| university you have to un-learn all sorts of things that you
| learned earlier because they were simplifications that no
| longer apply.
|
| The difference between this and a bad mental model generated
| by an LLM is that the high school science models were
| designed to be good didactic tools _and_ to be useful
| abstractions in their own right. An LLM output may be neither
| of those.
| simonw wrote:
| If you "tend to prefer reading original sources" then I
| think you're the best possible candidate for LLM-assisted
| learning, because you'll naturally use them as a starting
| point, not the destination. I like to use LLMs to get
| myself the grounding I need to then start reading further
| around a topic from more reliable sources.
|
| That's a great point about high school models being
| deliberately designed as didactic tools.
|
| LLMs will tend to spit those out too, purely because the
| high school version of anything has been represented
| heavily enough in the training data that it's more likely
| than not to fall out of the huge matrix of numbers!
| lolinder wrote:
| > LLMs will tend to spit those out too, purely because
| the high school version of anything has been represented
| heavily enough in the training data that it's more likely
| than not to fall out of the huge matrix of numbers!
|
| That assumes that the high school version of the subject
| exists, which is unlikely because I already have the high
| school version of most subjects that have a high school
| version.
|
| The subjects that I would want to dig into at that level
| would be something along the lines of chemical
| engineering, civil engineering, or economics--subjects
| that I don't yet know very much about but have interest
| or utility for me. These subjects don't have a widely-
| taught high school version crafted by humans, and I don't
| trust that they would have enough training data to
| produce useful results from an LLM.
| parentheses wrote:
| My biggest use for LLMs - situations where I use them heavily:
|
| - CLI commands and switches I don't care to or easily remember
|
| - taking an idea and exploring it in various ways
|
| - making Slack messages that are more engaging
|
| Using GPTs has a cost of breaking my concentration/flow, so it's
| not part of my core workflows.
|
| I really need to start weaving it into the programming aspects of
| my workday.
| skywhopper wrote:
| i appreciate the article and the full examples. But I have to say
| this all looks like a nightmare to me. Going back and forth in
| English with a slightly dumb computer that needs to be pestered
| constantly and hand-held through a process? This sounds really
| really painful.
|
| Not to mention that the author is not really learning the
| underlying tech in a useful way. They may learn how to prompt to
| correct the mistakes the LLM makes, but if it was a nightmare to
| go through this process once, then dealing with repeating the
| same laborious walkthrough each time you want to do something
| with Docker or build a trivial REST API sounds like living in
| hell to me.
|
| Glad this works for some folks. But this is not the way I want to
| interact with computers and build software.
| m2024 wrote:
| You're gonna get left in the dust by everyone else embracing
| LLMs.
|
| I am ecstatic about LLMs because I already practice
| documentation-driven development and LLMs perfectly complement
| this paradigm.
| duggan wrote:
| > You're gonna get left in the dust by everyone else
| embracing LLMs.
|
| Probably not, there's a very long tail to this sort of stuff,
| and there's plenty of programming to go around.
|
| I'll chime in with your enthusiasm though. Like the author of
| the post, I've been using LLMs productively for quite a while
| now and in a similar style (and similarly skeptical about
| previous hype cycles).
|
| LLMs are so useful, and it's fascinating to see how far
| people swing the opposite way on them. Such variable
| experiences, we're really at the absolute beginning of this
| whole thing (and the last time I said that to a group of
| friends there was a range of agreement/disagreement on that
| too!)
|
| Very exciting.
| Kiro wrote:
| Your comment is a not a good representation of how the
| experience actually is. Nothing painful or annoying about it.
| If anything, it's a relief.
| isoprophlex wrote:
| "I understand this better than you do" twice in about 30 lines.
| Okay then.
|
| I mean, sure, you do, but there's less off-putting ways to
| display your credentials...
| simonw wrote:
| I get why he wrote it like that. Having this conversation (the
| "I know there are lots of bad things about them, but LLMs are
| genuinely useful for all sorts of things" conversation) is
| pretty exhausting. This whole piece was very clearly a reaction
| to having had that conversation time and time again, at which
| point letting some frustration slip through is understandable.
| joenot443 wrote:
| What's everyone's coding LLM setup like these days? I'm still
| paying for Copilot through an open source Xcode extension and
| truthfully it's a lot worse than when I started using it.
| slibhb wrote:
| I gave up with autocomplete pretty quickly. The UX just wasn't
| there yet (though, to be fair, I was using some third party
| adapter with sublime).
|
| It's just me asking questions/pasting code into a ChatGPT
| browser window.
| mnk47 wrote:
| I just pay the $20/month for Claude Pro and copy/paste code.
| Many people use Cursor and Double, or alternative frontends
| they can use with an API key.
| vertis wrote:
| I use Cursor and Aider, I hadn't heard of Double. I've tried
| a bunch of others including Continue.dev, but found them all
| to be lacking.
| jazzyjackson wrote:
| Supermaven (vscode extension) was quite handy at recognizing
| that I was making the same kind of changes in multiple places
| and accurately auto-completed the way I was about to write it,
| I liked it better than copilot
|
| I just wish they were better at recognizing when their help is
| not wanted because I would often disable it and forget to turn
| it back on for a while. Maybe a "mute for an hour" would fix
| that.
| levzettelin wrote:
| neovim with the gp.nvim plugin.
|
| Allows you to open chats directly in a neovim window. Also,
| allows you to select some text and then run it with certain
| prompts (like "implement" or "explain this code"). Depending on
| the prompt you can make the result appear directly inside the
| buffer you're currently working on. The request to the ChatGPT
| API also is enriched with the file-type.
|
| I hated AI before I discovered this approach. Now I'm an AI
| fanboy.
| nunodonato wrote:
| www.workspaicehq.com
| squirrel wrote:
| For about 20 years, chess fans would hold "centaur" tournaments.
| In those events, the best chess computers, who routinely trounced
| human grandmasters, teamed up with those same best-in-the-world
| humans and proceeded to wipe _both_ humans and computers off the
| board. Nicholas is describing in detail how he pairs up with LLMs
| to get a similar result in programming and research.
|
| Sobering thought: centaur tournaments at the top level are no
| more. That's because the computers got so good that the human
| half of the beast no longer added any meaningful value.
|
| https://en.wikipedia.org/wiki/Advanced_chess
| QuantumGood wrote:
| Most people only have heard "Didn't an IBM computer beat the
| world champion", and don't know that Kasparov pysched himself
| out when Deep Blue had actually maken a mistake. I was part of
| the online analysis of the (mistaken) engame move at the time
| that were the first to reveal the error. Kasparov was very
| stressed by that and other issues, some of which IBM caused
| ("we'll get you the printout as promised in the terms" and then
| never delivered). My friend IM Mike Valvo (now deceased) was
| involved with both matches. More info:
| https://www.perplexity.ai/search/what-were-the-main-controve...
| delichon wrote:
| When I was a kid my dad told me about the most dangerous animal
| in the world, the hippogator. He said that it had the head of a
| hippo on one end and the head of an alligator on the other, and
| it was so dangerous because it was very angry about having
| nowhere to poop. I'm afraid that this may be a better model of
| an AI human hybrid than a centaur.
| romwell wrote:
| ...so, the hippogator was dangerous because he was literally
| full of shit.
|
| Hmmmm.
| disqard wrote:
| A bit of a detour (inspired by your words)... if anything,
| LLMs will soon be "eating their own poop", so structurally,
| they're a "dual" of the "hippogator" -- an ouroboric
| coprophage. If LLMs ever achieve sentience, will they be mad
| at all the crap they've had to take?
|
| Beautiful story, and thanks for sharing :)
| surfingdino wrote:
| Sounds like the author is trying really hard to find an edge use
| case for an LLM. Meanwhile on YouTube... "I Made 100 Videos In
| One Hour With Ai - To Make Money Online"
| dmvdoug wrote:
| I thought author meant how they use the two-letter sequence "AI"
| and I just came here to say, Allen Iverson.
| ilaksh wrote:
| Something weird is going on with this web page on Chrome in
| Ubuntu. The table of contents is obscuring the page text.
| nitwit005 wrote:
| > Trimming down large codebases to significantly simplify the
| project.
|
| I was a bit excited at something being able to do that, but this
| apparently means simplifying a single file, based on their
| example.
|
| I suspect they're having an unusually positive experience with
| these tools due to working on a lot of new, short, programs.
| qayxc wrote:
| > I suspect they're having an unusually positive experience
| with these tools due to working on a lot of new, short,
| programs.
|
| That's academia for you :)
|
| It also helps that he specialises deep learning models and LLMs
| and knows a thing or two about the inner workings, how to
| prompt (he authored papers about adversial attacks on LLMs) and
| what to expect.
| ChildOfChaos wrote:
| I mean it's good but all the answers seem to be coding which
| seems like is the main use for large language models.
| bionhoward wrote:
| Must be nice to work on stuff that doesn't compete with
| "intelligence as a service." I feel that's an empty set, and
| everyone using these services actively rationalizes selling out
| the human race by *paying to get brain raped.*
|
| "Open" AI - customer noncompete Copilot - customer noncompete
| Anthropic - customer noncompete Gemini - customer noncompete (api
| only, wow)
|
| Just imagine millions of people know about the imitation game and
| still pay someone to fuck them over like that.
|
| Surely, our descendants will thank us for such great
| contributions to the memory banks of the monopolies of the boring
| dystopia ..
| amai wrote:
| The problem I have with LLMs is that one can never be sure that
| it will give you the best possible solution. In fact in coding
| very often it will give you a working but also outdated solution.
| And this is futile. Because in coding even the best possible
| solution nowadays gets old very quickly. But if you use LLMs your
| code will be outdated from the start. That is nothing I would pay
| for.
___________________________________________________________________
(page generated 2024-08-04 23:00 UTC)