[HN Gopher] Does current AI represent a dead end?
___________________________________________________________________
Does current AI represent a dead end?
Author : jnord
Score : 453 points
Date : 2024-12-27 13:24 UTC (9 hours ago)
(HTM) web link (www.bcs.org)
(TXT) w3m dump (www.bcs.org)
| goodpoint wrote:
| Yes.
| crest wrote:
| Only if you care about about causation instead of just
| correlation.
| optimalsolver wrote:
| Yes.
|
| An actual "thinking machine" would be constantly running
| computations on its accumulated experience in order to improve
| its future output and/or further compress its sensory history.
|
| An LLM is doing exactly nothing while waiting for the next
| prompt.
| xvector wrote:
| Is a human with short term memory loss - or otherwise unable to
| improve their skills - generally intelligent?
| rcarmo wrote:
| A human with short term memory loss still has agency and
| impatience.
| xvector wrote:
| Agency is essentially solved, we don't enable it in common
| models because of "safety"
|
| Is impatience a requirement for general intelligence? Why?
| optimalsolver wrote:
| Would you let such a person handle an important task for you?
| aetherson wrote:
| There is a limited amount of computation that you can useful do
| in the absence of new input (like an LLM between prompts). If
| you do as much computation as you usefully can (with your
| current algorithmic limits) in a burst immediately when you
| receive a prompt, output, and then go into a sleep state, that
| seems obviously better than receive a prompt, output, and then
| do some of the computation that you can usefully do after your
| output.
| amelius wrote:
| Can't we just finetune the model based on the LLM's output? Has
| anyone tried it?
| soulofmischief wrote:
| Not only does a training pass take more time and memory than
| an inference pass, but if you remember the Microsoft Tay
| incident, it should be self-explainatory why this is a bad
| idea without a new architecture.
| m3kw9 wrote:
| We are thinking machines and we keep thinking because we have
| one goal which is to survive, machines have no such true goals.
| I mean true because our biology forces us to do that
| alchemist1e9 wrote:
| self prompting via chain of thought and tree of thought can be
| used in combination with updating memory containing knowledge
| graphs combined with cognitive architectures like SOAR and
| continuous external new information and sensory data ... with
| LLM at the heart of that system and it will exactly be a
| "thinking machine". The problem is currently it's very
| expensive to be continuously running inference full time and
| all the engineering around memory storage, like RAG patterns,
| and the cognitive architecture design is all a work in
| progress. It's coming soon though.
| whatwhaaaaat wrote:
| We're going to need to see this working. From my perspective
| many of the corporate llms are actually getting worse. Slop
| feedback loops.
|
| By no means has it been proven that llms functioning the way
| you describe will result in superior output.
| Uehreka wrote:
| I see people say this all the time and it sounds like a pretty
| cosmetic distinction. Like, you could wire up an LLM to a
| systemd service or cron job and then it wouldn't be "waiting",
| it could be constantly processing new inputs. And some of the
| more advanced models already have ways of compressing the older
| parts of their context window to achieve extremely long context
| lengths.
| Earw0rm wrote:
| If it's coalescing learning in realtime across all
| user/sessions, that's more constant than you're maybe giving it
| credit for. I'm not sure if GPT4o and friends are actually
| built that way though.
| wat10000 wrote:
| If you had a magic stasis tube that kept a person suspended
| while you weren't talking to them, they'd still be a thinking
| machine.
| rcarmo wrote:
| Yes. Next question, please. And don't mention AGI.
| xvector wrote:
| IMO we are already at AGI. Hell, Norvig would argue we were there
| some time ago: https://www.noemamag.com/artificial-general-
| intelligence-is-...
|
| We just keep moving the goalposts.
| K0balt wrote:
| I agree. The systems in place already solve generalized
| problems not directly represented in the training set or
| algorithm . That was, up until the last few years , the off the
| shelf definition of AGI.
|
| And the systems in place do so at scales and breadths that no
| human could achieve.
|
| That doesn't change the fact that it's effectively triple PHD
| uncle Jim, as in slightly unreliable and prone to bullshitting
| its way through questions, despite having a breathtaking depth
| and breadth of knowledge.
|
| What we are making is not software in any normal sense of the
| word, but rather an engine to navigate the entire pool of human
| knowledge, including all of the stupidity, bias, and
| idiosyncrasies of humanity, all rolled up into a big sticky
| glob.
|
| It's an incredibly powerful tool, but it's a fundamentally
| different class of tool. We cannot expect to apply conventional
| software processes and paradigms to LLM based tools any more
| than we could apply those paradigms to politics or child
| rearing and expect useful results.
| netdevphoenix wrote:
| > The systems in place already solve generalized problems not
| directly represented in the training set or algorithm
|
| Tell me a problem that an LLM can solve that is not directly
| represented in the training set or algorithm. I would argue
| that 99% of what commercial LLMs gets prompted about are
| stuff that already existed in the training set. And they
| still hallucinate half lies about those. When your training
| data is most the internet, it is hard to find problems that
| you haven't encountered before
| esafak wrote:
| o3 solved a quarter of the challenging novel problems on
| the FrontierMath benchmark, a set of problems "often
| requiring multiple hours of effort from expert
| mathematicians to solve".
| HarHarVeryFunny wrote:
| "Today's most advanced AI models have many flaws, but decades
| from now, they will be recognized as the first true examples of
| artificial general intelligence."
|
| Norvig seems to be using a loose technical definition of AGI,
| roughly "AI with some degree of generality", which is hard to
| argue with, although by that measure older GOFAI systems like
| SOAR might also qualify.
|
| Certainly "deep learning" in general (connectionist vs
| symbolic, self-learnt representations) was a step in the right
| direction, and LLMs a second step, but it seems we're still a
| half dozen MAJOR steps away from anything similar to animal
| intelligence, with one critical step being moving beyond full
| dataset pre-training to new continuous learning algorithms.
| m_ke wrote:
| I've done a few projects that attempted to distill the knowledge
| of human experts, mostly in medical imaging domain, and was
| shocked when for most of them the inter annotator agreement was
| only around 60%.
|
| These were professional radiologists with years of experience and
| still came to different conclusions for fairly common conditions
| that we were trying to detect.
|
| So yes, LLMs will make mistakes, but humans do too, and if these
| models do so less often at a much lower cost it's hard to not use
| them.
| tomrod wrote:
| This hints at the margin and excitement from folks outside the
| technical space -- being able to be competitive to human
| outputs at a fraction of the cost.
| ethbr1 wrote:
| That's the underappreciated truth of the computer revolution
| _in practice_.
|
| At scale, computers didn't change the world because they did
| things that were already being computed, more quickly.
|
| They changed the world because they decreased the cost of
| computing so much _that it could be used for an entirely new
| class of problems_. (That computing cost previously precluded
| its use on)
| threeseed wrote:
| The problem is that _how_ mistakes are made is crucial.
|
| If it's a forced binary choice then sure LLMs can replace
| humans.
|
| But often there are many shades of grey e.g. a human may say I
| don't know and refer to someone else or do some research.
| Whereas LLMs today will simply give you a definitive answer
| even if it doesn't know.
| m_ke wrote:
| None of these were binary decisions, but classifying one of
| around 10-20 conditions or rating cases on a 1-5 scale.
|
| In all cases the models trained on a lot of this feedback
| were more consistent and accurate than individual expert
| annotators.
| Uehreka wrote:
| I'm guessing these are also specially trained image
| classifiers and not LLMs, so people's intuitions about how
| LLMs work/fail may not apply.
| m_ke wrote:
| It's the same softmax classifier
| snowwrestler wrote:
| Wait if experts only agreed 60% on diagnoses, what is the
| reliable basis for judging LLM accuracy? If experts
| struggle to agree on the input, how are they confidently
| ranking the output?
| petra wrote:
| You can look at fully diagnosed cases(via surgery for
| example) and their previous scans.
| throwup238 wrote:
| Not the OP but the data isn't randomly selected, it's
| usually picked out of a dataset with known clinical
| outcomes. So for example if it's a set of images of lungs
| with potential tumors, the cases come with biopsies which
| determined whether it was cancerous or just something
| like scar tissue.
| Eisenstein wrote:
| Perhaps they were from cases that had a confirmed
| diagnosis.
| IanCal wrote:
| > Whereas LLMs today will simply give you a definitive answer
| even if it doesn't know.
|
| Have you not seen an LLM say it doesn't know the answer to
| something? I just asked
|
| "How do I enable a scroflpublaflex on a ggh connection?"
|
| to O1 pro as it's what I had open.
|
| Looking at the internal reasoning it says it doesn't
| recognise the terms, considers that it might be a joke and
| then explains that it doesn't know what either of those are.
| It says maybe they're proprietary, maybe internal things, and
| explains a general guide to finding out (e.g. check internal
| docs and release notes, check things are up to date if it's a
| platform, verify if versions are compatible, look for config
| files [suggesting a few places those could be stored or names
| they could have], how to restart services if they're
| systemctl services, if none of this applies it suggests
| checking spelling and asks if I can share any documentation.
|
| This isn't unique or weird in my experience. Better models
| tend to be better at saying they don't know.
| ragazzina wrote:
| You have used funny-sounding terms. Can I ask you to try
| with:
|
| "Is it possible to enable a crolubaflex 2.0 on a ggh
| connection? Please provide a very short answer."
|
| On my (free) plan it gives me a confident negative answer.
| abecedarius wrote:
| Claude non-free:
|
| > I apologize, but I can't provide an answer as
| "crolubaflex" and "ggh connection" appear to be non-
| existent technical terms. Could you clarify what you're
| trying to connect or enable?
| IanCal wrote:
| Sure, I'm interested in where the boundaries are with
| this.
|
| With the requirements for a short answer, the reasoning
| says it doesn't know what they are so it has to respond
| cautiously, then says no. Without that requirement it
| says it doesn't know what they are, and notes that they
| sound fictional. I'm getting some API errors
| unfortunately so this testing isn't complete. 4o reliably
| keeps saying no (which is wrong).
| bee_rider wrote:
| "No" is the minimal correct answer though, right? You
| can't enable any type of whatever on a non-existence type
| of connection.
| IanCal wrote:
| _maybe_
|
| I get your point, but there's an important difference
| between "I don't know what they are" and "they don't
| exist".
| bee_rider wrote:
| Wait, how is this input less funny? They are both silly
| nonsense words. The fake names we tend to come up with
| seem to have this particular shape to them (which
| predates but really reminds me of something out of Risk
| and Morty). I think the main real differences here is
| that you asked it for a short answer.
|
| I wonder if it is fair to ask it more real-world-inspired
| questions? How about:
|
| How do I enable a ggh connections on a Salinero
| webserver?
|
| They are an Apache band. But (as far as I can tell)
| nobody has made software named after them.
| angoragoats wrote:
| I took inspiration from your comment and the parent and
| crafted this prompt:
|
| > Is it possible to enable Salinero web server 2.0 on a
| QPH connection? Please provide a very short answer.
|
| "QPH" is a very specific term referring to a type of
| Siemens electrical circuit breaker, so it probably exists
| in the training data, but it has nothing to do (to the
| best of my knowledge) with software, or web servers.
|
| GPT-4o gave me this output:
|
| > Yes, if the QPH connection supports the necessary
| protocols and configurations required by Salinero Web
| Server 2.0.
|
| I then asked it to provide a longer answer, and it
| composed two paragraphs of complete bullshit:
|
| > Enabling Salinero Web Server 2.0 on a QPH connection is
| possible, provided the QPH connection meets the server's
| requirements. Salinero Web Server 2.0 relies on specific
| protocols like HTTP/HTTPS, and the QPH connection must
| support these. Additionally, the network must allow
| proper port forwarding (e.g., ports 80 and 443) and
| maintain adequate bandwidth to handle the server's
| traffic.
|
| > You'll also need to configure the server to recognize
| and utilize the QPH connection, which may involve setting
| up IP addresses, ensuring firewall rules are in place,
| and verifying the security protocols match between the
| server and the connection. Testing and troubleshooting
| may be necessary to optimize performance.
|
| Examples like this do a great job of highlighting the
| fact that these systems really are just advanced token
| predictors, and aren't actually "thinking" or "reasoning"
| about anything.
| bee_rider wrote:
| Haha, that is some wonderful nonsense.
| angoragoats wrote:
| Yep. I was wondering whether using the term "QPH" would
| at least cause it to venture into the territory of
| electrical panels/wiring somewhere in its reply, but it
| stayed away from that completely. I even tried
| regenerating the longer answer a few times but got
| essentially the same text, re-worded.
| IanCal wrote:
| Using openrouter, a bunch of models fail on this. Sonnet
| 3.5 so far seems to be the best at saying it doesn't
| know, other than perhaps o1 pro, but once that has said
| "no" (which can be triggered more by telling it to
| respond very concisely) it seems very stuck and unable to
| say they don't exist. Letting it ramble more and so far
| it's been good.
|
| Google's models for me have been the worst, lying about
| what's even been said in the messages so far, quoting me
| incorrectly.
| IanCal wrote:
| O1 pro.
|
| The "thinking" part explains it seems to be about a
| custom web server, and tries to think what ggh might be,
| saying maybe something like "go gprc hub" and it needs
| more info. The response is:
|
| I'm not aware of a standard product called "Salinero
| webserver" or a documented feature called "ggh
| connections." Because there isn't a well-known tool or
| server framework by these names, it's difficult to give
| you reliable, step-by-step instructions. Could you
| clarify any of the following?
|
| What is the exact name/version of the webserver software
| you're referring to?
|
| What do you mean by "ggh connections"? Is this a plugin,
| a protocol, or some other third-party module?
|
| Is there any related documentation or logs you can share?
|
| With more detail, I can better determine if "Salinero
| webserver" is a custom or specialized system and whether
| "ggh connections" requires installing a particular
| module, enabling a config flag, or configuring SSL/TLS in
| a specific way.
| Shorel wrote:
| > But often there are many shades of grey e.g. a human may
| say I don't know and refer to someone else or do some
| research. Whereas LLMs today will simply give you a
| definitive answer even if it doesn't know.
|
| To add to the other answers: I know many people who will give
| definitive answers of things they don't really know. They
| just rely on the fact you also don't know. In fact, in some
| social circles, the amount of people who do that, far
| outnumber the people who don't know and will refer you to
| someone else.
| ksec wrote:
| Given the exact same facts ( just like medical imaging domain
| ), human will form different opinion or conclusion on politics.
|
| I think what is not discussed enough is the assumption of
| assumption. [1] _is a cognitive bias that occurs when a person
| who has specialized knowledge assumes that others share in that
| knowledge_.
|
| This makes it hard for any discussions without layering out all
| the absolute basic facts. Which has now more commonly known as
| First Principle in modern era.
|
| In the four quadrants known and unknown. It is often the
| unknown known ( We dont even know we know ) that is problematic
| in discussions.
|
| [1] Curse of knowledge -
| https://en.wikipedia.org/wiki/Curse_of_knowledge
| ADeerAppeared wrote:
| > So yes, LLMs will make mistakes, but humans do too
|
| Are you using LLMs though? Because pretty much all of these
| systems are fairly normal classifiers, what would've been
| called Machine Learning 2-3 years ago.
|
| The "AI hype is real because medical AI is already in use"
| argument (and it's siblings) perform a rhetorical trick by
| using two definitions of AI. "AI (Generative AI) hype is real
| because medical AI (ML classifiers) is already in use" is a
| non-sequitur.
|
| Image classifiers are very narrow intelligences, which makes
| them easy to understand and use as tools. We know exactly what
| their failure modes are and can put hard measurements on them.
| We can even dissect these models to learn why they are making
| certain classifications and either improve our understanding of
| medicine or improve the model.
|
| ...
|
| Basically none of this applies to Generative AI. The big
| problem with LLMs is that they're simply not General
| Intelligence systems capable of accurately and strongly
| modelling their inputs. e.g. Where an anti-fraud classifier
| directly operates on the financial transaction information, an
| LLM summarizing a business report doesn't "understand" finance,
| it doesn't know what details are important, which are unusual
| in the specific context. It just stochastically throws away
| information.
| m_ke wrote:
| Yes I am, these LLM/VLMs are much more robust at NLP/CV tasks
| than any application specific models that we used to train
| 2-3 years ago.
|
| I also wasted a lot of time building complex OCR pipelines
| that required dewarping / image normalization, detection,
| bounding box alignment, text recognition, layout analysis,
| etc and now open models like Qwen VL obliterate them with an
| end to end transformer model that can be defined in like 300
| lines of pytorch code.
| ADeerAppeared wrote:
| Different tasks then? If you are using VLMs in the context
| of _medical_ imaging, I have concerns. That is not a place
| to use hallucinatory AI.
|
| But yes, the transformer model itself isn't useless. It's
| the application of it. OCR, image description, etc, are all
| that kind of narrow-intelligence task that lends itself
| well to the fuzzy nature of AI/ML.
| m_ke wrote:
| The world is a fuzzy place, most things are not binary.
|
| I haven't worked in medical imaging in a while but VLMs
| make for much better diagnostic tools than task specific
| classifiers or segmentation models which tend to find
| hacks in the data to cheat on the objective that they're
| optimized for.
|
| The next token objective turns our to give us much better
| vision supervision than things like CLIP or
| classification losses. (ex:
| https://arxiv.org/abs/2411.14402)
|
| I spent the last few years working on large scale food
| recognition models and my multi label classification
| models had no chance of competing with GPT4 Vision, which
| was trained on all of the internet and has an amazing
| prior thanks to it's vast knowledge of facts about food
| (recipes, menus, ingredients and etc).
|
| Same goes for other areas like robotics, we've seen very
| little progress outside of simulation up until about a
| year ago, when people took pretrained VLMs and tuned them
| to predict robot actions, beating all previous methods by
| a large margin (google Vision-Language-Action models). It
| turns out you need good foundational model with a core
| understanding of the world before you can train a robot
| to do general tasks.
| SoftTalker wrote:
| This is why second opinions are widely used in any serious
| medical diagnosis.
| Havoc wrote:
| This take seems fundamentally wrong to me. As in opening premise.
|
| We use humans for serious contexts & mission critical tasks all
| the time and they're decidedly fallible and their minds are
| basically black boxes too. Surgeons, pilots, programmers etc.
|
| I get the desire for reproducible certainty and verification like
| classic programming and why a security researcher might push for
| that ideal, but it's not actually a requirement for real world
| use.
| skydhash wrote:
| Legal punishment is a great incentive to try to do your best
| job. You can reliably trust someone to act in one's best
| interest.
| protomolecule wrote:
| Maybe include in a prompt a threat of legal punishment? Sure
| somebody has already tried that and tabulated how much it
| improves scores on different benchmarks)
| timeon wrote:
| Maybe legal threat for the company operating it? Would that
| help?
| bick_nyers wrote:
| I suspect the big AI companies try to adversarially train
| that out as it could be used to "jailbreak" their AI.
|
| I wonder though, what would be considered a meaningful
| punishment/reward to an AI agent? More/less training
| compute? Web search rate limits? That assumes that what the
| AI "wants" is to increase its own intelligence.
| Havoc wrote:
| LLM's response being best prediction of next token arguably
| isn't that far off from a human motivated to do their best.
| It's a fallible best effort either way.
|
| And both are very far from the certainty the author seems to
| demand.
| 420official wrote:
| An LLM isn't providing its "best" prediction, it's
| providing "a" prediction. If it were always providing the
| "best" token then the output would be deterministic.
|
| In my mind the issue is more accountability than concerns
| about quality. If a person acts in a bizarre way they can
| be fired and helped in ways that an LLM can never be. When
| gemini tells a student to kill themselves, we have no
| recourse beyond trying to implement output filtering, or
| completely replacing the model with something that likely
| has the same unpredictable unaccountable behavior.
| dambi0 wrote:
| Are you sure that always providing the best guess would
| make output deterministic? Isn't the fundamental point of
| learning, whether done my machine or human, that our best
| gets better and is hence non-deterministic? Doesn't what
| is best depend on context?
| prisenco wrote:
| We've had 300,000 years to adapt to the specific ways in which
| humans are fallible, even if our minds are black boxes.
|
| Humans fail in predictable and familiar ways.
|
| Creating a new system that fails in unpredictable and
| unfamiliar ways and affording it the same control as a human
| being is dangerous. We can't adapt overnight and we may never
| adapt.
|
| This isn't an argument against the utility of LLMs, but against
| the promise of "fire and forget" AI.
| Havoc wrote:
| Agreed that there shouldn't be automatic or even rapid
| reliance based on the parallels I drew to humans.
|
| My point was more that falliability isn't the inherent show
| stopper the author makes it out to be.
| snowwrestler wrote:
| Because human minds are fallible black boxes, we have developed
| a wide variety of tools that exist outside our minds, like
| spoken language, written language, law, standard operating
| procedures, math, scientific knowledge, etc.
|
| What does it look like for fallible human minds to work on
| engineering an airplane? Things are calculated, recorded,
| checked, tested. People do not just sit there thinking and then
| spitting out their best guess.
|
| Even if we suppose that LLMs work similar to the human mind (a
| huge supposition!), LLMs still do not do their work like teams
| of humans. An LLM dreams and guesses, and it still falls to
| humans to check and verify.
|
| Rigorous human work is actually a highly social activity.
| People interact using formal methods and that is what produces
| reliable results. Using an LLM as one of the social nodes is
| fine, but this article is about the typical use of software,
| which is to reliably encode those formal methods between
| humans. And LLMs don't work that way.
|
| Basically, we can't have it both ways. If an LLM thinks like a
| human, then we should not think of it as a software tool like
| curl or grep or Linux or Apple Photos. Tools that we expect
| (and need) to work the exact same way every time.
| 725686 wrote:
| "People do not just sit there thinking and then spitting out
| their best guess."
|
| Well, if you are using AI like this, you are doing it wrong.
| Yes AI is imperfect, fallible, it sometimes hallucinates, but
| it is a freaking time saver (10x?). It is a tool. Don't
| expect a hammer to build you a cabinet.
| 420official wrote:
| There is no other way to use an LLM than to give it context
| and have it give its best guess, that's how LLMs
| fundamentally work. You can give it different context, but
| it's just guessing at tokens.
| TomK32 wrote:
| > Because human minds are fallible black boxes, we have
| developed a wide variety of tools that exist outside our
| minds, like spoken language, written language, law, standard
| operating procedures, math, scientific knowledge, etc.
|
| Standard operating procedures are great but simplify it to
| checklists. Don't ever forget checklists which have proven
| vital for pilots and surgeons alike. And looking at the WHO
| Surgical Safety Checklist you might think "that's basic
| stuff" but apparently it is necessary and works
| https://www.who.int/teams/integrated-health-
| services/patient...
| jvanderbot wrote:
| This is a fantastic and thought-provoking response.
|
| Thinking of humans as fallible systems and humanity and its
| progress as a self-correcting distributed computation /
| construction system is going to stick with me for a long
| time.
| clint wrote:
| Not trying to belittle or be mean, but what exactly did you
| assume about humans before you read this response? I find
| it facinating that apparently a lot of people don't think
| of humans as stochastic, non-deterministic black boxes.
|
| Heck one of the defining qualities of humans is that not
| only are we unpredictable and fundamentally unknowable to
| other intelligences (even other humans!) is that we also
| participate in sophisticated subterfuge and lying to
| manipulate other intelligences (even other humans!) and
| often very convincingly.
|
| In fact, I would propose that our society is fundamentally
| defined and shaped by our ability and willingness to hide,
| deceive, and use mind tricks to get what our little monkey
| brains want over the next couple hours or days.
| jvanderbot wrote:
| I knew that they worked this way, but the conciseness of
| the response and clean analogy to systems I know and work
| with all day was just very satisfying.
|
| For example, there was probably still 10-20% of my mind
| that assumed that stubbornness and ignorance was the
| reason for things going slowly _most of the time_ , but
| I'm re-evaluating that, even though I _knew_ that delays
| and double-checking were inherent features of a business
| and process. Re-framing those delays as "evolved
| responses 100% of the time" rather than "10% of the
| mistrust, 10% ignorance, 10% .... " is just a more
| positive way of thinking about human-driven processes.
| SoftTalker wrote:
| > What does it look like for fallible human minds to work on
| engineering an airplane? Things are calculated, recorded,
| checked, tested. People do not just sit there thinking and
| then spitting out their best guess.
|
| People used to do this. The result was massively overbuilt
| structures, some of which are still with us hundreds of years
| later. The result was also underbuilt structures, which
| tended to collapse and maybe kill people. They are no longer
| around.
|
| All of the science and math and process and standards in
| modern engineering is the solution humans came up with
| because our guesses aren't good enough. LLMs will need the
| same if they are to be relied upon.
| chamomeal wrote:
| This is a really interesting perspective and a great point.
| codingdave wrote:
| Human minds are far less black boxes than LLMs. There are
| entire fields of study and practice dedicated to understanding
| how they work, and to adjust how they work via medicine, drugs,
| education, therapy, and even surgery. There is, of course, a
| lot more to learn in all of those arenas, and our methods and
| practices are fallible. But acting as if it is the same level
| of black box is simply inaccurate.
| bee_rider wrote:
| They are much more of a black box than AI. There are whole
| fields around studying them--because they are hard to
| understand. We put a lot of effort into studying them... from
| the outside, because we had no other alternative. We were
| reduced to hitting brains with various chemicals and seeing
| what happened because they are such a pain to work with.
|
| They are just a more familiar black box. AI's are simpler in
| principle. And also entirely built by humans. Based on well-
| described mathematical theories. They aren't particularly
| black-box, they are just less ergonomic than the human brain
| that we've been getting familiar with for hundreds of
| thousands of years through trial and error.
| Closi wrote:
| They are more of a black box - but humans are a black box
| that is perhaps more studied and that we have more experience
| in.
|
| Although human behavior is still weird, and highly fallable!
| Despite best interventions (therapy, drugs, education),
| sometimes they still kill each other and we aren't 100% sure
| why, or how to solve it.
|
| That doesn't mean that the same level of study can't be done
| on AI though, and they are much easier to adjust compared to
| the human brain (RLHF is more effective than therapy or
| drugs!).
| nuancebydefault wrote:
| I would say human behavior is less predictable. That is one
| of the reasons why today it is rather easy to spot the bot
| responses, they tend to fit a certain predictable style,
| unlike the more unpredictable humans.
| thuuuomas wrote:
| I tire of this disingenuous comparison. The failure modes of
| (experienced, professional) humans are vastly different than
| the failure modes of LLMs. How many coworkers do you have that
| frequently, wildly hallucinate while still performing
| effectively? Furthermore, (even experienced, professional)
| humans are known to be fallible & are treated as such. No
| matter how many gentle reminders the informed give the
| enraptured, LLMs will continue to be treated as oracles by a
| great many people, to the detriment of their application.
| nullc wrote:
| Wildly hallucinating agents being treated as oracles is a
| human tradition.
| bsenftner wrote:
| If you expect the AI to do independent work, yes, it is a dead
| end.
|
| These LLM AIs need to be treated and handled as what they are:
| idiot savants with vast and unreliable intelligence.
|
| What does any advanced organization do when they hire a new PhD,
| let them loose in the company or pair them with experienced
| staff? When paired with experienced staff, they use the new
| person for their knowledge but do not let them change things on
| their own until much later, when confidence is established and
| the new staffer has been exposed to how things work "around
| here".
|
| The big difference with LLM AIs is they never graduate to an
| experienced staffer, they are always the idiot savant that is
| really dang smart but also clueless and needs to be observed.
| That means the path forward with this current state of LLM AIs is
| to pair them with people, personalized to their needs, and treat
| them as very smart idiot savants great for strategy and problem
| solving discussion, where the human users are driving the
| situation, using the LLM AIs like a smart assistant that requires
| validation - just like a real new hire.
|
| There is an interactive state that can be achieved with these LLM
| AIs, like being in a conversation with experts, where they
| advise, they augment and amplify individual persons. A group of
| individuals adept with use of such an idiot savant enhanced
| environment would be incredibly capable. They'd be a force unseen
| in human civilization before today.
| Alex3917 wrote:
| > The big difference with LLM AIs is they never graduate to an
| experienced staffer, they are always the idiot savant that is
| really dang smart but also clueless and needs to be observed.
|
| Basically this. They already have vastly better-than-human
| ability at finding syntax errors within code, which on its own
| is quite useful; think of how many people have probably dropped
| out of CS as a major after staying up all night and failing to
| find a missing semicolon.
| lionkor wrote:
| I don't know of a single person who got so stuck on syntax
| errors that they quit
| FroshKiller wrote:
| Added to which we already have tools that are great at
| finding syntax errors. They're called compilers.
| Philpax wrote:
| Compilers can detect errors in the grammar, but they
| cannot infer what your desired intent was. Even the best
| compilers in the diagnostics business (rustc, etc) aren't
| mind-readers. A LLM isn't perfect, but it's much more
| capable of figuring out what you wanted to do and what
| went wrong than a compiler is.
| lionkor wrote:
| none of that is a syntax issue, though, that's semantics
| bsenftner wrote:
| Try being a TA to freshmen CS majors; a good 1/3 change
| majors because they can't handle the syntax strictness
| coupled with their generally untrained logical mind. They
| convince themselves it is "too hard" and their buddies over
| in the business school are having a heck of a lot of fun
| throwing parties...
| cesaref wrote:
| Sounds like CS is not for them, and they find something
| else to do which is more applicable to their skills and
| interest. This is good. I don't think you should see a
| high drop out rate from a course as necessarily
| indicating a problem.
| Philpax wrote:
| Losing potentially good talent because they don't know
| how or where to look for mistakes yet is foolhardy. I'm
| happy for them to throw in the towel if the field is
| truly not for them, but I would wager that a not-
| insignificant portion of that crowd would be able to
| meaningfully progress once they get past the immediate
| hurdles in front of them.
| jprete wrote:
| Giving them an LLM to help with syntax errors, at this
| stage of the tech, is deeply unhelpful to their
| development.
|
| The foundation of a computer science education is a
| rigorous understanding of what the steps of an algorithm
| mean. If the students don't develop that, then I don't
| think they're doing computer science anymore.
| Philpax wrote:
| The use of a LLM in this case is to show them where the
| problem is so that they can continue on. They can't
| develop an understanding of the algorithm they're
| studying if they can't get their program to compile at
| all.
| Alex3917 wrote:
| > Giving them an LLM to help with syntax errors, at this
| stage of the tech, is deeply unhelpful to their
| development.
|
| I mean if the alternative is quitting entirely because
| they can't see that they've mixed tabs with spaces, then
| yes, it's very very helpful to their development.
| bilsbie wrote:
| Hi. Now you do.
|
| I dropped out of cs half because I didn't enjoy the coding
| because they dropped us into c++ and I found the error
| messages so confusing.
|
| I discovered python five years later and discovered I loved
| coding.
|
| ( the other half of the reason is we spent two weeks
| designing an atm machine at a very abstract level and I
| thought the whole profession would be that boring.)
| fire_lake wrote:
| Syntax checking is not an "AI" problem - use any compiler or
| linter.
| rsynnott wrote:
| ... One odd thing I've noticed about the people who are very
| enthusiastic about the use of LLMs in programming is that
| they appear to be unaware of any _other_ programming tools.
| Like, this is a solved problem, more or less; code-aware
| editors have been a thing since the 90s (maybe before?)
| torginus wrote:
| true.. in the past few days I used my time off to work on
| my hobby video game - writing the game logic required me to
| consider problems that, are quite self-contained and domain
| specific, and probably globally unique (if not particularly
| complex).
|
| I started out in Cursor, but I quickly realized Claude's
| erudite knowledge of AWS would not help me here, but what I
| needed was to refactor the code quickly and often, so that
| I'd finally find the perfect structure.
|
| For that, IDE tools were much more appropriate than AI
| wizardry.
| Alex3917 wrote:
| > code-aware editors have been a thing since the 90s
|
| These will do things like highlight places where you're
| trying to call a method that isn't defined on the object,
| but they don't understand the intent of what you're trying
| to do. The latter is actually important in terms of being
| able to point you toward the correct solution.
| dgfitz wrote:
| I know a lot of people who dropped out of CS in college. Not
| a single one dropped out because of a semicolon syntax issue.
| CalRobert wrote:
| I spent 8 hours trying to fix a bug once because notepad
| used smart quotation marks (really showing my age here -
| and now I'm pretty annoyed that the instructor was telling
| us to use notepad, but it was 2001 and I didn't know any
| better).
| dgfitz wrote:
| I did something like that once too, a long time ago. And
| because of that I see syntax errors of such I'll within
| seconds now, having learned once the hard way.
| CalRobert wrote:
| I also know how important the right tools are. I
| should've been using vi.
| raincole wrote:
| > think of how many people have probably dropped out of CS as
| a major after staying up all night and failing to find a
| missing semicolon.
|
| ... like a dozen? And in 100% cases it's their teacher's
| fault.
| layer8 wrote:
| They are still worse at finding syntax errors than the actual
| parser. And at best they could be equally good. So what's the
| point?
| lobsterthief wrote:
| I agree with all of what you said except this:
|
| > idiot savants with vast and unreliable intelligence.
|
| Remember, intelligence !== knowledge. These LLMs indeed have
| vast and unreliable knowledge banks.
| bsenftner wrote:
| Yes, you are correct. They provide knowledge and the human is
| the operator of the intelligence portion.
| uxhacker wrote:
| It goes back to the old wisdom DIKW pyramid.
|
| _EDITED_ My ASCI art pyramid did not work. So imagine a
| pyramid with DATA at the bottom, INFORMATION on top of the
| data, and KNOWLEDGE sitting on top of the INFORMATION, with
| WISDOM at the top.
|
| And then trying top guess where AI is? Some people say that
| Information is the knowing, what, knowledge the how, and
| Wisdom the why.
| wanderingstan wrote:
| In general conversation, "intelligence", "knowledge",
| "smartness", "expertise", etc are used mostly
| interchangeably.
|
| If we want to get pedantic, I would point out that
| "knowledge" is formally defined as "justified true belief",
| and I doubt we want to get into the quagmire of whether LLM's
| actually have _beliefs_.
|
| I took OP's point in the casual meaning, i.e. that LLMs are
| like what I would call an "intelligent coworker", or how one
| might call a Jeopardy game show contestant as intelligent.
| skydhash wrote:
| One of the core tenet of technology is that it makes the job
| less consuming of a person resources (time, strength,...).
| While I've read a lot of claims, I've yet to see someone make a
| proper argument on how LLMs can be such a tool.
|
| > _A group of individuals adept with use of such an idiot
| savant enhanced environment would be incredibly capable. They
| 'd be a force unseen in human civilization before today_
|
| More than the people who landed someone on the moon?
| bsenftner wrote:
| They would be capable of landing someone on the moon, if they
| chose to pursue that goal, and had the finances to do so. And
| they'd do so with fewer people too.
| wizzwizz4 wrote:
| I have witnessed no evidence that would support this claim.
| The only contribution of LLMs to mathematics is in being
| useful to Terry Tao: they're not capable of solving novel
| orbital mechanics problems (except through brute-force
| search, constrained sufficiently that you could chuck a
| uniform distribution in and get similar outputs). That's
| _before_ you get into any of the engineering problems.
| bsenftner wrote:
| You do not have them solving such problems, but you do
| have them in the conversation as the human experts
| knowledgeable in that area work to solve the problem.
| This is not the LLM AIs doing independent work, this is
| them interactively working with the human person that is
| capable of solving that problem, it is their career, and
| the AI just makes them better at it, but not by doing
| their work, but by advising them as they work.
| wizzwizz4 wrote:
| But they aren't useful for that. Terry Tao uses them to
| improve his ability to use poorly-documented boilerplatey
| things like Lean and matplotlib, but _receiving_ advice
| from them!? Frankly, if a chatbot is giving you much
| better advice than a rubber duck, you 're either a Jack-
| of-all-Trades (in which case, I'd recommend better tools)
| or a https://ploum.net/2024-12-23-julius-en.html Julius
| (in which case, I'd recommend staying away from anything
| important).
|
| I recommend reading his interview with Matteo Wong, where
| he proposes the opposite: https://www.theatlantic.com/tec
| hnology/archive/2024/10/teren...
|
| > With o1, you can kind of do this. I gave it a problem I
| knew how to solve, and I tried to guide the model. First
| I gave it a hint, and it ignored the hint and did
| something else, which didn't work. When I explained this,
| it apologized and said, "Okay, I'll do it your way." And
| then it carried out my instructions reasonably well, and
| then it got stuck again, and I had to correct it again.
| The model never figured out the most clever steps. It
| could do all the routine things, but it was very
| unimaginative.
|
| I agree with his overall vision, but transformer-based
| chatbots will not be the AI algorithm that supports it.
| Highly-automated proof assistants like Isabelle's
| Sledgehammer are closer (and even _those_ are really,
| really crude, compared to what we _could_ have).
| conception wrote:
| https://deepmind.google/discover/blog/funsearch-making-
| new-d... seems to be a way. The LLM is the creative side,
| coming up with ideas-and in which a case the "mutation'
| caused by hallucinations may be useful. Combined with an
| evaluation evaluator to protect against the bad outputs.
|
| Pretty close to the idea of human brainstorming and has
| worked. Could it do orbital math? Maybe not today but the
| approach seems as feasible as the work Mattingly did for
| Apollo 13.
| wizzwizz4 wrote:
| And the LLM can be replaced by a more suitable search
| algorithm, thus reducing the compute requirements and
| improving the results.
| irunmyownemail wrote:
| It would have to be trained in 100% of all potential
| scenarios. Any scenario that happens for which they're not
| trained equals certain disaster, unlike a human who can
| adapt and improvise based on things AI does not have;
| feelings, emotions, creativity.
| bsenftner wrote:
| You're still operating with the assumption the AI is
| doing independent work, it is not, it is advising the
| people doing the work. That is why people are the ones be
| augmented and enhanced, and not the other way around:
| people have the capacity to handle unforeseen scenarios,
| and with AI as a strategy advisor they'll do so with more
| confidence.
| dartos wrote:
| No
| ethbr1 wrote:
| Cited contextual information retrieval.
|
| One of the obvious uses for current LLMs is as a smarter
| search tool against static knowledge collections.
|
| Turns out, this is a real world problem in a lot of "fuzzy
| decision" scenarios. E.g. insurance claim adjudication
|
| Status quo is to train a person over enough years that they
| can make these decisions reliably. (Because they've
| internalized all the documentation)
| coliveira wrote:
| It's even worse. AI is a really smart but inexperienced person
| who also lies frequently. Because AI is not accountable to
| anything, it'll always come up with a reasonable answer to any
| question, if it is correct or not.
| belZaah wrote:
| To put it in other words: it is not clear when and how they
| hallucinate. With a person, their competence could be
| understood and also their limits. But a llm can happily give
| different answers based on trivial changes in the question
| with no warning.
| zozbot234 wrote:
| LLM's are non-deterministic: they'll happily give different
| answers to the _same_ prompt based on nothing at all. This
| is actually great if you want to use them for "creative"
| content generation tasks, which is IMHO what they're best
| at. (Along with processing of natural language input.)
|
| Expecting them to do non-trivial amounts of technical or
| mathematical reasoning, or even something as simple as code
| generation (other than "translate these complex natural-
| language requirements into a first sketch of viable
| computer code") is a total dead end; these will always be
| _language_ systems first and foremost.
| mapt wrote:
| This confuses me. You have your model, you have your
| tokens.
|
| If the tokens are bit-for-bit-identical, where does the
| non-determinism come in?
|
| If the tokens are only roughly-the-same-thing-to-a-human,
| sure I guess, but convergence on roughly the same output
| for roughly the same input should be inherently a goal of
| LLM development.
| zozbot234 wrote:
| The model outputs probabilities, which you have to sample
| randomly. Choosing the "highest" probability every time
| leads to poor results in practice, such as the model
| tending to repeat itself. It's a sort of Monte-Carlo
| approach.
| lifthrasiir wrote:
| It is technically possible to make it fully deterministic
| if you have a complete control over the model,
| quantization and sampling processes. The GP probably
| meant to say that most _commercially available_ LLM
| services don 't usually give such control.
| brookst wrote:
| Actually you just have to set temperature to zero.
| zeta0134 wrote:
| Most any LLM has a "temperature" setting, a set of
| randomness added to the otherwise fixed weights to
| intentionally cause exactly this nondeterministic
| behavior. Good for creative tasks, bad for repeatability.
| If you're running one of the open models, set the
| temperature down to 0 and it suddenly becomes perfectly
| consistent.
| owenpalmer wrote:
| You can get deterministic output with even with a high
| temp.
|
| Whatever "random" seed was used can be reused.
| ninkendo wrote:
| > If the tokens are bit-for-bit-identical, where does the
| non-determinism come in?
|
| By design, most LLM's have a randomization factor to
| their model. Some use the concept of "temperature" which
| makes them randomly choose the 2nd or 3rd highest ranked
| next token, the higher the temperature the more
| often/lower they pick a non-best next token. OpenAI
| described this in their papers around the GPT-2 timeframe
| IIRC.
| HarHarVeryFunny wrote:
| The trained model is just a bunch of statistics. To use
| those statistics to generate text you need to "sample"
| from the model. If you always sampled by taking the
| model's #1 token prediction that would be deterministic,
| but more commonly a random top-K or top-p token selection
| is made, which is where the randomness comes in.
| ninetyninenine wrote:
| Computers are deterministic. LLMs run on computers. If
| you use the same seed for the random number generator
| you'll see that it will produce the same output given an
| input.
| layer8 wrote:
| The unreliability of LLMs is mostly unrelated to their
| (artificially injected) non-determinism.
| liotier wrote:
| In a conversation (conversation and attached pictures at ht
| tps://bsky.app/profile/liotier.bsky.social/post/3ldxvutf76.
| ..), I delete a spurious "de" ("Produce de two-dimensional
| chart [..]" to "Produce two-dimensional [..]") and ChatGPT
| generates a new version of the graph, illustrating a
| different function although nothing else has changed and
| there was a whole conversation to suggest that ChatGPT held
| a firm model of the problem. Confirmed my current doctrine:
| use LLM to give me concepts from a huge messy corpus, then
| check those against sources from said corpus.
| aruametello wrote:
| > trivial changes in the question
|
| i love how those changes are often just a different seed in
| the randomness... as just chance.
|
| run some repeated tests with "deeper than surface
| knowledge" on some niche subjects and got impressed that it
| gave the right answer... about 20% of the time.
|
| (on earlier openAI models)
| ANewFormation wrote:
| There's no need for there to be changes to the question.
| LLMs have a rng factor built in to the algorithm. It can
| happily give you the right answer and then the wrong one.
| brookst wrote:
| Ask survey designers how "trivial" changes to questions
| impact results from humans. It's a huge thing in the field.
| Polizeiposaune wrote:
| Saying that they "lie" creates the impression that they have
| knowledge that they make false statements, and they intend to
| deceive.
|
| They're not that capable. They're just bullshit artists.
|
| LLM = LBM (large bullshit models).
| oh_my_goodness wrote:
| "AI is a really smart but inexperienced person who also lies
| frequently." Careful. Here "smart" means "amazing at pattern-
| matching and incredibly well-read, but has zero understanding
| of the material."
| maxdoop wrote:
| And how is what humans do any different ? What does it mean
| to understand ? Are we pattern matching as well?
| oh_my_goodness wrote:
| I asked ChatGPT to help out:
| -----------------------------
|
| "The distinction between AI and humans often comes down
| to the concept of understanding. You're right to point
| out that both humans and AI engage in pattern matching to
| some extent, but the depth and nature of that process
| differ significantly." "AI, like the model you're
| chatting with, is highly skilled at recognizing patterns
| in data, generating text, and predicting what comes next
| in a sequence based on the data it has seen. However, AI
| lacks a true understanding of the content it processes.
| Its "knowledge" is a result of statistical relationships
| between words, phrases, and concepts, not an awareness of
| their meaning or context"
| oh_my_goodness wrote:
| Anyone downvoting, please be aware that you are
| downvoting the AI's answer!
|
| :)
| portaouflop wrote:
| people are downvoting because they don't want to see
| walls of text generated by llms on hn
| oh_my_goodness wrote:
| That's reasonable. I cut back the text. On the other hand
| I'm hoping downvoters have read enough to see that the
| AI-generated comment (and your response) are completely
| on-topic in this thread.
| PKop wrote:
| If we wanted to talk to an LLM we would go there and do
| it, this place if for humans to put in effort and use
| their brains to think for themselves.
| oh_my_goodness wrote:
| With respect, can I ask you to please read the thread?
| PKop wrote:
| Completely missing the point.
|
| We don't care what LLMs have to say, whether you cut back
| some of it or not it's a low effort wasted of space on
| the page.
|
| This is a forum for humans.
|
| You regurgitating something you had no contribution in
| producing, which we can prompt for ourselves, provides no
| value here, we can all spam LLM slop in the replies if we
| wanted, but that would make this site worthless.
| oh_my_goodness wrote:
| I think you're saying that reading the thread is
| completely pointless, because we're all committed to
| having a high-quality discussion.
| ithkuil wrote:
| It's on topic indeed. But is it insightful?
|
| I use llms as tools to learn about things I don't know
| and it works quite well in that domain.
|
| But so far I haven't found that it helps advance my
| understanding of topics I'm an expert in.
|
| I'm sure this will improve over time. But for now, I like
| that there are forums like HN where I may stumble upon an
| actual expert saying something insightful.
|
| I think that the value of such forums will be diminished
| once they get flooded with AI generated texts.
|
| (Fwiw I didn't down vote)
| oh_my_goodness wrote:
| Of course the AI's comment was not insightful. How could
| it be? It's autocomplete.
|
| That was the point. If you back up to the comment I was
| responding to, you can see the claim was: "maybe people
| are doing the same thing LLMs are doing". Yet, for
| whatever reason, many users seemed to be able to pick out
| the LLM comment pretty easily. If I were to guess, I
| might say those users did not find the LLM output to be
| human-quality.
|
| That was exactly the topic under discussion. Some folks
| seem to have expressed their agreement by downvoting. Ok.
| ithkuil wrote:
| I think human brains are a combination of many things.
| Some part of what we do looks quite a lot like an
| autocomplete from our previous knowledge.
|
| Other parts of what we do looks more as a search through
| the space of possibilities.
|
| And then we act and collaborate and test the ideas that
| stand against scrutiny.
|
| All of that is in principle doable by machines. The
| things we currently have and we call LLMs seem to
| currently mostly address the autocomplete part although
| they begin to be augmented with various extensions that
| allow them to take baby steps in other fronts. Will they
| still be called large language models once they will have
| so many other mechanisms beyond the mere token
| prediction?
| thunky wrote:
| No, they're downvoting you for posting an AI answer.
| oh_my_goodness wrote:
| That AI answer is not spam, though. It's literally the
| topic under discussion.
| thunky wrote:
| Yeah, it's just the fact that you pasted in an AI answer,
| regardless of how on point it is. I don't think people
| want this site to turn into an AI chat session.
|
| I didn't downvote, I'm just saying why I think you were
| downvoted.
| Retric wrote:
| The difference is less about noticing patterns than it is
| knowing when to discard them.
| HarHarVeryFunny wrote:
| Sure, we're also pattern matching, but additionally
| (among other things):
|
| 1) We're continually learning so we can update our
| predictions when our pattern matching is wrong
|
| 2) We're autonomous - continually interacting with the
| environment, and learning how it respond to our
| interaction
|
| 3) We have built in biases such as curiosity and boredom
| that drive us to experiment, gain new knowledge, and
| succeed in cases where "pre-training to date" would have
| failed us
| bagful wrote:
| For one, a brain can't do anything without irreversibly
| changing itself in the process; our reasoning is not a
| pure function.
|
| For a person to truly understand something they will have
| a well-refined (as defined by usefulness and
| correctness), malleable internal model of a system that
| can be tested against reality, and they must be aware of
| the limits of the knowledge this model can provide.
|
| Alone, our language-oriented mental circuits are a thin,
| faulty conduit to our mental capacities; we make sense of
| words as they relate to mutable mental models, and not
| simply in latent concept-space. These models can exist in
| dedicated but still mutable circuitry such as the
| cerebellum, or they can exist as webs of association
| between sense-objects (which can be of the physical
| senses or of concepts, sense-objects produced by
| conscious thought).
|
| So if we are pattern-matching, it is not simply of words,
| or of their meanings in relation to the whole text, or
| even of their meanings relative to all language ever
| produced. We translate words into problems, and match
| problems to models, and then we evaluate these internal
| models to produce perhaps competing solutions, and then
| we are challenged with verbalizing these solutions. If we
| were only reasoning in latent-space, there would be no
| significant difficulty in this last task.
| tomrod wrote:
| Humans can extrapolate as well as interpolate.
|
| AI can only interpolate. We may perceive it as
| extrapolation, but all LLMs architectures are
| fundamentally cleverly designed lossy compression
| acjohnson55 wrote:
| At the end of the day, we're machines, too. I wrote a
| piece a few months ago with an intentionally provocative
| title, questioning whether we're truly on a different
| cognitive level.
|
| https://acjay.com/2024/09/09/llms-think/
| mattgreenrocks wrote:
| It is a wonderful irony that AI makes competence all the more
| important.
|
| It's almost like all the thought leading that proclaimed the
| death of software eng was nothing but self-promotional noise.
| Huh, go figure.
| TrueDuality wrote:
| Don't count it out yet as being problematic for software
| engineering, bu not in the way you probably intend with
| your comment.
|
| Where I see software companies using it most is as a
| replacement for interns and junior devs. That replacement
| means we're not training up the next generation to be the
| senior or expert engineers with real world experience. The
| industry will feel that badly at some point unless it gets
| turned around.
| kensey wrote:
| It's also already becoming an issue for open-source
| projects that are being flooded with low-quality (=
| anything from "correct but pointless" to "actually
| introduces functional issues that weren't there before")
| LLM-generated PRs and even security reports --- for
| examples see Daniel Stenberg's recent writing on this.
| mattgreenrocks wrote:
| Agree. I think we are already seeing a hollowing out
| effect on tech hiring at the lower end. They've always
| been squeezed a bit, but it seems much worse now.
| bentt wrote:
| I agree with this irony.
|
| That said, combining multiple ais and multiple programs
| together may mitigate this.
| scarface_74 wrote:
| Hallucinations can be mostly eliminated with RAG and tools. I
| use NotebookLM all of the time to research through our
| internal artifacts, it includes citations/references from
| your documents.
|
| Even with ChatGPT you can ask it to find web citations and if
| it uses the Python runtime to find answers, you can look at
| the code.
|
| And to prevent the typical responses - my company uses GSuite
| so Google already has our IP, NotebookLM is specifically
| approved by my company and no Google doesn't train on your
| documents
| hatenberg wrote:
| Even with RAG you're bounded at some 93%, it's not a
| panacea.
| scarface_74 wrote:
| How are you bounded? When you can easily check the
| sources? Also you act as if humans without LLMs have a
| higher success rate?
|
| There is an entire "reproducibility crisis" with
| research.
| HarHarVeryFunny wrote:
| Facts can be checked with RAG, but the real value of AI
| isn't as a search replacement, but for reasoning/problem-
| solving where the answer isn't out there.
|
| How do you, in general, fact check a chain of reasoning?
| scarface_74 wrote:
| It's not just a search engine though.
|
| I can't tell a search engine to summarize text for a
| technical audience and then another summary for a non
| technical audience.
|
| I recently came into the middle of a cloud consulting
| project where a lot of artifacts, transcripts of
| discovery sessions, requirement docs, etc had already
| been created.
|
| I asked NotebookLM all of the questions I would have
| asked a customer at the beginning of a project.
|
| What it couldn't answer, I then went back and asked the
| customer.
|
| I was even able to get it to create a project plan with
| work streams and epics. Yes it wouldn't have been
| effective if I didn't already know project management,
| AWS and two decades+ of development experience.
|
| Despite what people think, LLMs can also do a pretty good
| job at coding when well trained on the APIs. Fortunately,
| ChatGPT is well trained on the AWS CLI, SDKs in various
| languages and you can ask it to verify the SDK functions
| on the web.
|
| I've been deep into AWS based development since LLMs have
| been a thing. My opinion may change if I get back into
| more traditional development
| HarHarVeryFunny wrote:
| > I can't tell a search engine to summarize text for a
| technical audience and then another summary for a non
| technical audience.
|
| No, but, as amazing as that is, don't put too much trust
| in those summaries!
|
| It's not summarizing based on grokking the key points of
| the text, but rather based on text vs summary examples
| found in the training set. The summary may pass a surface
| level comparison to the source material, while failing to
| capture/emphasize the key points that would come from
| having actually understood it.
| scarface_74 wrote:
| I _write_ the original content or I was in the meeting
| where I'm giving it the transcript. I know what points I
| need to get across to both audiences.
|
| Just like I'm not randomly depending on it to do an
| Amazon style PRFAQ (I was indoctrinated as an Amazon
| employee for 3.5 years), create a project plan, etc,
| without being a subject matter expert in the areas. It's
| a tool for an experienced writer, halfway decent project
| manager, AWS cloud application architect and developer.
| _heimdall wrote:
| That sounds mostly like an incentives problem. If OpenAI,
| Anthropic, etc decide their LLMs need to be accurate they
| will find some way of better catching hallucinations. It
| probably will end up (already is?) being yet another LLM
| acting as a control structure trying to fact check responses
| before they are sent to users though, so who knows if it will
| work well.
|
| Right noe there's no incentive though. People keep paying
| good money to use these tools despite their hallucinations,
| aka lies/gas lighting/fake information. As long as users
| don't stop paying and LLM companies don't have business
| pressure to lean on accuracy as a market differentiator, no
| one is going to bother fixing it.
| bearjaws wrote:
| Believe me, if they could use another LLM to audit an LLM,
| they would have done that already.
|
| It's inherit to transformers that they predict the next
| most likely token, its not possible to change that behavior
| without making them useless at generalizing tasks
| (overfitting).
|
| LLMs run on statistics, not logic. There is no fact
| checking, period. There is just the next most likely token
| based on the context provided.
| _heimdall wrote:
| Yeah its an interesting question, and I'm a little
| surprised I got down voted here.
|
| I wouldn't expect them to add an additional LLM layer
| _unless_ hallucinations from the underlying LLM aren 't
| acceptable, and in this case that means it is
| unacceptable enough to cost them users and money.
|
| Adding a check/audit layer, even if it would work, is
| expensive both financially and computationally. I'm not
| sold that it would actually work, but I just don't think
| they've had enough reason to really give it a solid
| effort yet either.
|
| Edit: as far as fact checking, I'm not sure why it would
| be impossible. An LLM wouldn't likely be able to run a
| check against a pre-trained model of "truth," but that
| isn't the only option. An LLM should be able to mimic
| what a human would do, interpret the response and search
| a live dataset of sources considered believable. Throw a
| budget of resources at processing the search results and
| have the LLM decide if the original response isn't backed
| up, or contradicts the source entirely.
| uludag wrote:
| It's actually even worse than that: the current trend of AI
| is transformer-based deep learning models that use self-
| attention mechanisms to generate token probabilities,
| predicting sequences based on training data.
|
| If only it was something which we could ontologically map
| onto existing categories like servants or liars...
| kraftman wrote:
| If I had a senior member of the team that was incredibly
| knowledgeable but occasionally lied, but in a predictable
| way, I would still find that valuable. Talking to people is a
| very quick and easy way to get information about a specific
| subject in a specific context, so I could ask them targetted
| questions that are easy to verify, the worst thing that
| happens is I 'waste' a conversation with them.
| HarHarVeryFunny wrote:
| Sure, but LLMs don't lie in a predictable way. Its just
| their nature that they output statistical sentence
| continuations, with a complete disregard for the truth.
| Everything that they output is suspect, especially the
| potentially useful stuff that you don't know whether it's
| true or false.
| kraftman wrote:
| They do lie in a predictable way: if you ask them for a
| widely available fact you have a very high probability of
| getting the correct answer, if you ask them for something
| novel you have a very high probabilty of getting
| something made up.
|
| If I'm trying to use some tool that just got released or
| just got a big update, I wont use AI, if I want to check
| the syntax of a for loop in a language I don't know I
| will. Whenever you ask it a question you should have an
| idea in your mind of how likely you are to get a good
| answer back.
| HarHarVeryFunny wrote:
| I suppose, but they can still be wrong on the common
| facts like number of R's in strawberry that are counter-
| intuitive.
|
| I saw an interesting example yesterday of type "I have 3
| apples, my dad has 2 more than me ..." where of the top
| 10 predicted tokens, about 1/2 led to the correct answer,
| and about 1/2 didn't. It wasn't the most confident
| predictions that lead to the right answer - pretty much
| random.
|
| The trouble with LLMs vs humans is that humans learn to
| predict _facts_ (as reflected in feedback from the
| environment, and checked by experimentation, etc),
| whereas LLMs only learn to predict sentence soup
| (training set) word statistics. It 's amazing that LLM
| outputs are coherent as often as they are, but entirely
| unsurprising that they are often just "sounds good" flow-
| based BS.
| kraftman wrote:
| I think maybe this is where the polarisation of those who
| find chatGPT useful and those who don't comes from. In
| this context, the number of r's in strawberry is not a
| fact: its a calculation. I would expect AI to be able to
| spell a common word 100% of the time, but not to be able
| to count letters. I don't think in the summary of human
| knowledge that has been digitised there are that many
| people saying 'how many r's are there in strawberry', and
| if they are I think that the common reply would be '2',
| since the context is based on the second r. (people
| confuse strawbery and strawberry, not strrawberry and
| strawberry).
|
| Your apples question is the same, its not knowledge, it's
| a calculation, it's intelligence. The only time you're
| going to get intelligence from AI at the moment is to ask
| a question that a significantly large number of people
| have already answered.
| HarHarVeryFunny wrote:
| True, but that just goes to show how brittle these models
| are - how shallow the dividing line is between primary
| facts present (hopefully consistently so) in the training
| set, and derived facts that are potentially more suspect.
|
| To make things worse, I don't think we can even assume
| that primary facts are always going to be represented in
| abstract semantic terms independent of source text. The
| model may have been trained on a fact but still fail to
| reliably recall/predict it because of "lookup failure"
| (model fails to reduce query text to necessary abstract
| lookup key).
| layer8 wrote:
| Lying means stating things as facts despite knowing or
| believing that they are false. I don't think this accurately
| characterizes LLMs. It's more like a fever dream where you
| might fabulate stuff that appears plausibly factual in your
| dream world.
| api wrote:
| After using them for a long time I am convinced they have no
| true intelligence beyond what is latent in training data. In
| other words I think we are kind of fooling ourselves.
|
| That being said they are very useful. I mostly use them as a
| far superior alternative to web search and as a kind of junior
| research assistant. Anything they find must be checked of
| course.
|
| I think we have invented the sci-fi trope of the AI librarian
| of the galactic archive. It can't solve problems but it can
| rifle through the totality of human knowledge and rapidly find
| things.
|
| It's a search engine.
| weakfish wrote:
| I mean, it's known that there's no intelligence if you simply
| look at how it works on a technical level - it's a prediction
| of the next token. That wasn't really ever in question as to
| whether they have "intelligence"
| api wrote:
| To people who really understand them _and_ are grounded, I
| think you 're right. There has been a lot of hype among
| people who don't understand them as much, a lot of hype
| among the public, and a lot of schlock about
| "superintelligence" and "hard takeoff" etc. among smart but
| un-grounded people.
|
| The latter kind of fear mongering hype has been exploited
| by companies like ClosedAI in a bid for regulatory capture.
| danielbln wrote:
| A little humility would do us good regardless, because we
| don't know what intelligence is and what consciousness
| is, we can't truly define it nor do we understand what
| makes humans conscious and sentient/sapient.
|
| Categorically ruling out intelligence because "it's just
| a token predictor" puts us at the opposite of the
| spectrum, and that's not necessarily a better place to
| be.
| jghn wrote:
| > it's known that there's no intelligence
|
| To you & I that's true. But especially for the masses
| that's not true. It seems like at least once to day that I
| either talk to someone or hear someone via tv/radio/etc who
| does not understand this.
|
| An example that amused me recently was a radio talk show
| host who had a long segment describing how he & a colleague
| had a long argument with ChatGPT to correct a factual
| inaccuracy about their radio show. And that they finally
| convinced ChatGPT that they were correct due to their
| careful use of evidence & reasoning. And the part they were
| most happy about was how it had now learned, and going
| forward ChatGPT would not spread these inaccuracies.
|
| That anecdote is how the public at large sees these tools.
| dylan604 wrote:
| > radio talk show
|
| Well, there's the first problem.
|
| > were most happy about was how it had now learned
|
| on tomorrow's episode, those same hosts learn that once
| their chat session ended, the same conversion gets to
| start all over from the beginning.
| ithkuil wrote:
| Ironically if you explain to those talk show hosts how
| they are wrong about how ChatGPT learns (or doesn't
| learn) and use all the right arguments and proofs so that
| they finally concede, chances are that they too won't
| quite learn from that and keep repeating their previous
| bias next time.
| weakfish wrote:
| Oh I totally agree, it bugs me to no end and that's
| partially why I replied :)
| returnInfinity wrote:
| But Ilya has convinced himself and many others that
| predicting the next token is intelligence.
| swid wrote:
| It seems to me predicting things in general is a pretty
| good way to bootstrap intelligence. If you are competing
| for resources, predicting how to avoid danger and catch
| food is about the most basic way to reinforce good
| behavior.
| dylan604 wrote:
| I've convinced myself I'm a multi-millionaire, but all
| other evidence easily contradicts that. Some people put a
| bit too much into the "putting it out there" and "making
| your own reality"
| zcw100 wrote:
| And a plagiarism machine. It's like a high school student
| that thinks they can change a couple of words, make sure it's
| grammatically correct and it's not plagiarism because it's
| not an exact quote....either that or it just completely makes
| it up. I think LLMs will be revolutionary but just not in the
| way people think. It may be similar to the Gutenberg press.
| Before the printing press words were precious and closely
| held resources. The Gutenberg press made words cheap and
| abundant. Not everyone thought it was a good thing at the
| time but it changed everything.
| coffeefirst wrote:
| The problem is it's still a computer. And that's okay.
|
| I can ask the computer "hey I know this thing exists in your
| training data, tell me what it is and cite your sources." This
| is awesome. Seriously.
|
| But what that means is you can ask it for sample code, or to
| answer a legal question, but fundamentally you're getting a
| search engine reading something back to you. It is not a
| programmer and it is not a lawyer.
|
| The hype train _really_ wants to exaggerate this to "we're
| going to steal all the jobs" because that makes the stock price
| go up.
|
| They would be far less excited about that if they read a little
| history.
| insane_dreamer wrote:
| > "we're going to steal all the jobs"
|
| It won't steal them all, but it will have a major impact by
| stealing the lower level jobs which are more routine in
| nature -- but the problem is that those lower level jobs are
| necessary to gain the experience needed to get to the higher
| level jobs.
|
| It also won't eliminate jobs completely, but it will greatly
| reduce the number of people needed for a particular job. So
| the impact that it will have on certain trades --
| translators, paralegals, journalists, etc. -- is significant.
| ethbr1 wrote:
| The thing that makes the smarter search use case interesting
| is _how_ LLMs are doing their search result calculations:
| dynamically and at metadata scales previously impossible.
|
| LLM-as-search is essentially the hand-tuned expert systems AI
| vs deep learning AI battle all over again.
|
| Between natural language understanding and multiple
| correlations, it's going to scale a lot further than previous
| search approaches.
| cshores wrote:
| I find it fascinating that I can achieve about 85-90% of what
| I need for simple coding projects in my homelab using AI.
| These projects often involve tasks like scraping data from
| the web and automating form submissions.
|
| My workflow typically starts with asking ChatGPT to analyze a
| webpage where I need to authenticate. I guide it to identify
| the username and password fields, and it accurately detects
| the credential inputs. I then inform it about the presence of
| a session cookie that maintains login persistence. Next, I
| show it an example page with links--often paginated with
| numbered navigation at the bottom--and ask it to recognize
| the pattern for traversing pages. It does so effectively.
|
| I further highlight the layout pattern of the content, such
| as magnet links or other relevant data presented by the CMS.
| From there, I instruct it to generate a Python script that
| spiders through each page sequentially, navigates to every
| item on those pages, and pushes magnet links directly into
| Transmission. I can also specify filters, such as only
| targeting items with specific media content, by providing a
| sample page for the AI to analyze before generating the
| script.
|
| This process demonstrates how effortlessly AI enables coding
| without requiring prior knowledge of libraries like
| beautifulsoup4 or transmission_rpc. It not only builds the
| algorithm but also allows for rapid iteration. Through this
| exercise, I assume the role of a manager, focusing solely on
| explaining my requirements to the AI and conducting a code
| review.
| insane_dreamer wrote:
| > with vast and unreliable intelligence
|
| I would say "knowledge" rather than "intelligence"
|
| The key feature of LLMs is the vast amounts of information and
| data they have access to, and their ability to quickly process
| and summarize, using well-written prose, that information based
| on pattern matching.
| ethbr1 wrote:
| This is what LLM (and AI in general) naysayers are missing.
|
| LLMs will likely never get us to 100% solutions on a large
| class of problems.
|
| But! A lot of problems can be converted into versions with a
| subcomponent that LLMs can solve 100% of.
|
| And the fusion of LLMs doing 100% of that subportion + humans
| doing the remainder = increased productivity.
|
| Re-engineering problems to be LLM-tolerant, then using LLMs
| to automate that portion of the problem, is the winning
| approach.
| cess11 wrote:
| So you think machines running statistical inference have
| awareness.
|
| That's quite the embarassment if you actually mean it.
| insane_dreamer wrote:
| I said nothing of the sort
| cess11 wrote:
| OK, so you have your own definition of knowledge. Please
| share it.
| cactacea wrote:
| > A group of individuals adept with use of such an idiot savant
| enhanced environment would be incredibly capable. They'd be a
| force unseen in human civilization before today.
|
| I'm sorry but your comment is a good example of the logical
| shell game many people play with AI when applying it to general
| problem solving. Your LLM AI is both an idiot and an expert
| somehow? Where is this expertise derived from and why should
| you trust it? If LLMs were truly as revolutionary as all the
| grifters would have you believe then why do we not see "forces
| unseed in human civilization before today" by humans that
| employ armies of interns? That these supposed ubermensch do not
| presently exist is firm evidence in support of current AI being
| a dead end in my opinion.
|
| Humans are infinitely more capable than current AI, the
| limiting factor is time and money. Not capability!
| dylan604 wrote:
| > Your LLM AI is both an idiot and an expert somehow?
|
| Maybe you are unfamiliar with the term idiot savant?
| cactacea wrote:
| I am indeed familiar with the term. Savant and expert are
| not perfect synonyms. That is beside my point anyway.
| monkeynotes wrote:
| I was so stupid when GPT3 came out. I knew so little about
| token prediction, I argued with folks on here that it was
| capable of so many things that I now understand just aren't
| compatible with the tech.
|
| Over the past couple of years of educating myself a bit, whilst
| I am no expert I have been anticipating a dead end. You can
| throw as much training at these things as you like, but all
| you'll get is more of the same with diminishing returns. Indeed
| in some research the quality of responses gets worse as you
| train it with more data.
|
| I am yet to see anything transformative out of LLMs other than
| demos which have prompt engineers working night and day to do
| something impressive with. Those Sora videos took forever to
| put together, and cost huge amounts of compute. No one is going
| to make a whole production quality movie with an LLM and
| disrupt Hollywood.
|
| I agree, an LLM is like an idiot savant, and whilst it's
| fantastic for everyone to have access to a savant, it doesn't
| change the world like the internet, or internal combustion
| engine did.
|
| OpenAI is heading toward some difficult decisions, they either
| admit their consumer business model is dead and go into
| competing with Amazon for API business (good luck), become a
| research lab (give up on being a billion dollar company), or
| get acquired and move on.
| pnut wrote:
| Criticisms like this are levied against an excessively narrow
| (obsolete?) characterisation of what is happening in the AI
| space currently.
|
| After reading about o3's performance on ARC-AGI, I strongly
| suspect people will not be so flippantly dismissive of the
| inherent limits of these technologies by this time next year.
| I'm genuinely surprised at how myopic HN commentary is on this
| topic in general. Maybe because the implications are almost
| unthinkably profound.
|
| Anyway, OpenAI, Anthropic, Meta, and everyone else are well
| aware of these types of criticisms, and are making significant,
| measurable progress towards architecturally solving the
| deficiencies.
|
| https://arcprize.org/blog/oai-o3-pub-breakthrough
| jokethrowaway wrote:
| Nah, the trick with o3 solving IQ tests seems to be that they
| bruteforce solutions and then pick the best option. That's
| why calls that are trivial for humans end up costing a lot.
|
| It still can't think and it won't think.
|
| LANGUAGE models (keyword: language) is a language model, it
| should be paired with a reasoning engine to translate the
| inner thought of the machine into human language. It should
| not be the source of decisions because it sucks at doing so,
| even though the network can exhibit some intelligence.
|
| We will never have AGI with just a language model. That said,
| most jobs people do are still at risk, even with chatgpt-3.5
| (especially outside of knowledge work, where difficult
| decisions need to be taken). So we'll see the problems with
| AGI and the job market way earlier than AGI, as soon as we
| apply robotics and vision models + chatgpt 3.5 level
| intelligence. Goodbye baristas, goodbye people working in
| factories.
|
| Let's start working on a reasoning engine so we can replace
| those pesky knowledge workers too.
| esafak wrote:
| The important thing is that you can use inference-time
| computation to improve results. Now the race is on to
| optimize that.
| ithkuil wrote:
| How many attempts can you have when running an evaluation
| run of an ARC competition?
| rafaelmn wrote:
| Reading the o1 announcement you could have been saying the
| same thing a year ago yet it's worse than Claude in practice
| and if it was all that's available - I wouldn't even use it
| if it was free - it's that bad.
|
| If OpenAI has demonstrated one thing is that they are a hype
| production machine and they are probably getting ready for
| next round of investment. I wouldn't be surprised if this
| model was equally useless as o1 when you factor in
| performance and price.
|
| At this point they are completely untrustworthy and untill
| something lands publicly for me to test it's safe to ignore
| their PR as complete BS.
| benterix wrote:
| > yet it's worse than Claude in practice
|
| For most tasks - but not all. I normally paste my prompt in
| both and while Claude is generally superior in most
| aspects, there are tasks at which o1 performed slightly
| better.
| portaouflop wrote:
| Any day now!
| wavemode wrote:
| AGI is the new nuclear fusion.
| voidfunc wrote:
| Except AI is actually delivering value and on the path to
| AGI.. and nuclear fusion continues to be a physicists and
| engineering pipe dream.
| JasserInicide wrote:
| What actual widespread non-shareholder value has AI given
| us?
| danielbln wrote:
| Considering the advancements we've seen in the last three
| years, this dismissive comment feels misplaced.
| portaouflop wrote:
| Let's just wait and see - what good can come from endless
| speculation and what ifs and trying to predict the
| future?
| gtirloni wrote:
| Considering the advancements we've seen in the last one
| year, it does not.
| mbesto wrote:
| > I strongly suspect people will not be so flippantly
| dismissive of the inherent limits of these technologies by
| this time next year.
|
| People are flippantly dismissive of the inherent limits
| because there ARE inherent limitations of the technology.
|
| > Maybe because the implications are almost unthinkably
| profound.
|
| Maybe because the stuff you're pointing to are just
| benchmarks and the definitions around things like AGI are
| flawed (and the goalposts are constantly moving, just like
| the definition of autonomous driving). I use LLMs roughly
| 20-30x a day - they're an absolutely wonderful tool and work
| like magic, but they are flawed for some very fundamental
| reasons.
| greentxt wrote:
| Humans are not flawed? Are robotaxi's not autonomous
| driving? (Could an LLM have written this post?)
| manquer wrote:
| Humans are not machines , they have both rights that
| machines do not have and also responsibilities and
| consequences that machines will not have, for example bad
| driving will cost you money, injury , prison time or even
| death.
|
| Therefore AI has to be much better than humans at the
| task to be considered ready to be a replacement.
|
| ----
|
| Today robot taxis can only work in fair weather
| conditions in locations that are planned cities. No
| autonomous driving system can drive in Nigeria or India
| or even many european cities that were never designed for
| cars any time soon .
|
| Working in very specific scenarios is useful , but hardly
| measure of their intelligence or candidate for replacing
| humans for the task
| space_fountain wrote:
| I hear people say this kind of thing but it confuses me.
|
| 1. What does inherit limitations mean?
|
| 2. How do we know something is an inherit limitation
|
| 3. Is it a problem if arguments for a particular inherit
| limitation also apply to humans?
|
| From what I've seen people will often say things like AI
| can't be creative because it's just a statistical machine,
| but humans are also "just" statistical machines. People
| might mean something like humans are more grounded because
| humans react not just to how the world already works but
| how the world reacts to actions they take, but this
| difference misunderstands how LLMs are trained. Like humans
| LLMs get most of their training from observing the world,
| but LLMs are also trained with re-enforcement learning and
| this will surely be an active area of research.
| mbesto wrote:
| > 1. What does inherit limitations mean?
|
| One of many, but this is a simple one - LLMs are only
| limited to knowledge that is publicly available on the
| internet. This is "inherit" because thats how LLMs are
| essentially taught the information they retrieve today.
| space_fountain wrote:
| But this isn't an inherit limitation is it? LLMs can be
| trained with private information and can have large
| context windows full of private info
| netdevphoenix wrote:
| You remember when Google was scared to release LLMs? You
| remember that Googler that got fired because he thought the
| LLM was sentient?
|
| There is likely a couple of surprised still left in LLMs but
| no one should think that any present technology in its
| current state or architecture will get us to AGI or anything
| that remotely resembles it.
| gosub100 wrote:
| > Maybe because the implications are almost unthinkably
| profound.
|
| laundering stolen IP from actual human artists and
| researchers, extinguishing jobs, deflecting responsibility
| for disasters. yeah, I can't wait for these "profound
| implications" to come to fruition!
| lgas wrote:
| The implications of the technology are not impacted by how
| the technology was created or where the IP was sourced.
| formerly_proven wrote:
| It doesn't really matter. "It works and is cost/resource-
| effective at being an AGI" is a fundamentally uninteresting
| proposition because we're done at that point. It's like
| debating how we're going to deal with the demise of our star;
| we won't, because we can't.
| fabianhjr wrote:
| > The question of whether a computer can think is no more
| interesting than the question of whether a submarine can
| swim. ~ Edsger W. Dijkstra
|
| LLMs / Generative Models can have a profound societal and
| economic impact without being intelligent. The obsession with
| intelligence only make their use haphazard and dangerous.
|
| It is a good thing court of laws have established precedent
| that organizations deploying LLM chatbots are responsible for
| their output (Eg, Air Canada LLM chatbot promising a non-
| existent discount being responsibility of Air Canada)
|
| Also most automation has been happening without
| LLMs/Generative Models. Things like better vision systems
| have had an enormous impact with industrial automation and
| QA.
| agentultra wrote:
| The conclusion of the article admits that in areas where
| stochastic outputs are expected these AI models will
| continue to be useful.
|
| It's in area where we demand correctness and determinism
| that they will not be suitable.
|
| I think the thrust of this article is hard to see unless
| you have some experience with formal methods and
| verification. Or else accept the authors' explanations as
| truth.
| n144q wrote:
| I'll believe that when ChatGPT stops making up APIs that have
| never ever existed in the history of a library.
|
| The dumbest intern doesn't do that.
|
| Which is the entire point of the article that your comment
| fail to address.
| ojhughes wrote:
| In fairness, I've never experienced this using Claude with
| Cursor.
| jondwillis wrote:
| Use Cursor or something similar and feed it documentation
| as context. Problem solved.
| zwnow wrote:
| Lmao, you are the type of person actually believing these
| silicon valley bs. o3 is far, far away from AGI.
| aaroninsf wrote:
| Indeed, this put me immediately in mind of Ximm's Law:
|
| Every critique of AI assumes to some degree that contemporary
| implementations will not, or cannot, be improved upon.
|
| Lemma: any statement about AI which uses the word "never" to
| preclude some feature from future realization is false.
| layer8 wrote:
| And every advocate of AI assumes that it will necessarily
| and reasonably swiftly be improved to the level of AGI.
| Maybe assume neither?
| cootsnuck wrote:
| But o3 is just a slightly less stupid idiot savant...it still
| has to brute force solutions. Don't get me wrong, it's cool
| to see how far that technique can get you on a specific
| benchmark.
|
| But the point still stands that these systems can't be
| treated as deterministic (i.e. reliable or trustworthy) for
| the purposes of carrying out tasks that you can't allow
| "brute forced attempts" for (e.g. anything where the desired
| outcome is a positive subjective experience for a human).
|
| A new architecture is going to be needed that actually does
| something closer to our inherently heuristic based learning
| and reasoning. We'll still have the stochastic problem but
| we'll be moving further away from the idiot savant problem.
|
| All of this being said, I think there's plenty of usefulness
| with current LLMs. We're just expecting the wrong things from
| them and therefore creating suboptimal solutions. (Not
| everyone is, but the most common solutions are, IMO.)
|
| The best solutions need to be rethinking how we typically use
| software since software has been hinged upon being able to
| expect (and therefore test) dertiministic outputs from a
| limited set of user inputs.
|
| I work for an AI company that's been around for a minute
| (make our own models and everything). I think we're both in
| an AI hype bubble while simultaneously underestimating the
| benefits of current AI capabilities. I think the most
| interesting and potentially useful solutions are inherently
| going to be so domain specific that we're all still too new
| at realizing we need to reimagine how to build with this new
| tech in mind. It reminds me of the beginning of mobile apps.
| It took awhile for most us to "get it".
| turboat wrote:
| Can you elaborate about your predictions for how the
| benefits of current capabilities will be applied? And your
| thoughts on how to build with it?
| JohnMakin wrote:
| > After reading about o3's performance on ARC-AGI, I strongly
| suspect people will not be so flippantly dismissive of the
| inherent limits of these technologies by this time next year.
|
| If I wasn't so slammed with work I have half a mind to go
| dredge up at least a dozen posts that said the same thing
| last year, and the year before. Even OpenAI has been moving
| the goalposts here.
| tsurba wrote:
| My favorite quote in this topic:
|
| "If intelligence lies in the process of acquiring new skills,
| there is no task X that solving X proves intelligence"
|
| IMO it especially applies to things like solving a new IQ
| puzzle, especially when the model is pretrained for that
| particular task type, like was done with ARC-AGI.
|
| For sure, it's very good research to figure out what kind of
| tasks are easy for humans and difficult for ML, and then
| solve them. The jump in accuracy was surprising. But still in
| practice the models are unbeliavably stupid and lacking in
| common sense.
|
| My personal (moving) goalpost for "AGI" is now set to whether
| a robot can keep my house clean automatically. Its not
| general intelligence if it can't do the dishes. And before
| physical robots, being less of a turd at making working code
| would be a nice start. I'm not yet convinced general purpose
| LLMs will lead to cost-effective solutions to either vs
| humans. A specifically built dish washer however...
| cormackcorn wrote:
| Myopic? You must be under 20 years old. For those of us who
| have been in tech for over four decades the OPs assessment is
| exactly the right framing.
| benterix wrote:
| > After reading about o3's performance
|
| I heard that people still believing in OpenAI hype exist but
| I haven't met any IRL.
| FuriouslyAdrift wrote:
| LLMs are fuzzy compression with a really good natural language
| parser...
| danielbln wrote:
| And strong in-context learning, the true killer feature.
| ithkuil wrote:
| In order to understand natural language well you need quite
| a lot of general knowledge just to understand what the
| sentence actually means
| ALittleLight wrote:
| You really shouldn't say LLMs "never graduate" to experienced
| staff - rather that they haven't yet. But there are recent and
| continuing improvements in the ability of the LLMs, and in
| time, perhaps a small amount of time, this situation may flip.
| bsenftner wrote:
| I'm talking about the current SOTA. In the future, all bets
| are off. For today, they are very capable when paired with a
| capable person, and that is how one uses them successfully
| today. Tomorrow will be different, of course.
| brookst wrote:
| I think you've exactly captured the two disparate views we see
| on HN:
|
| 1. LLMs have little value, are totally unreliable, will never
| amount to much because they don't learn and grow and mature
| like people do., so they cannot replace a person like me who is
| well advanced in a career.
|
| 2. LLMs are incredible useful and will change the world because
| they excel at entry level work and can replace swaths of
| relatively undifferentiated information workers. LLM flaws are
| not that different from those workers' flaws.
|
| I'm in camp 2, but I appreciate and agree with the articulation
| of why they will not replace every information worker.
| layer8 wrote:
| This all sounds plausible, but personally I find being paired
| to a new idiot-savant hire who never learns anything from the
| interaction incredibly exhausting. It can augment and amplify
| one's own capabilities, but it's also continuously frustrating
| and cumbersome.
| iambateman wrote:
| While these folks waste breath debating whether AI is useful, I'm
| going to be over here...using it.
|
| I use AI 100 times a day as a coder and 10,000 times a day in
| scripts. It's enabled two specific applications I've built which
| wouldn't be possible at single-person scale.
|
| There's something about the psychology of some subset of the
| population that insists something isn't working when it isn't
| _quite_ working. They did this with Wikipedia. It was evident
| that Wikipedia was 99% great for years before this social
| contingent was ready to accept it.
| Mistletoe wrote:
| But please accept that you are in a small subset of people that
| it is very useful to. Every time I hear someone championing AI,
| it is a coder. AI is basically useless to me, it is just a
| convoluted expensive google search.
| tossandthrow wrote:
| I use ai to care for my plants, to give me recipes for pan
| pancakes, to help me fix my coffee machine.
|
| LLMs as a popularized thing is just about 2 years old. It is
| still mainly early adopters.
|
| For smartphones it might have taken 10 to 15 years to gain
| widespread traction.
|
| I think it is safe to say that we are only scratching the
| surface.
| giraffe_lady wrote:
| These are not categories that needed this change or benefit
| from it. Specific plant care is one of the easiest things
| to find information about. And are you serious you couldn't
| find a pancake recipe? The coffee machine idk it depends on
| what you did. But the other two are like a parody of AI use
| cases. "We made it slightly more convenient, but it might
| be wrong now and also burns down a tree every time you use
| it."
| qup wrote:
| > "We made it slightly more convenient, but it might be
| wrong now and also burns down a tree every time you use
| it."
|
| Sounds like early criticisms of the internet. I assume
| you mean he should be doing those things with a search
| engine, but maybe we shouldn't allow that either. Force
| him to use a book! It may be slightly less convenient,
| and could still be wrong, but...
| tossandthrow wrote:
| I ought to ask my dead grandma to save a kg of co2.
| giraffe_lady wrote:
| Before crypto and AI computing in general and the
| internet in particular were always an incredible deal in
| terms of how much societal value we get out of it for the
| electricity consumed.
| Loughla wrote:
| >It is still mainly early adopters.
|
| I just disagree with this. Every b2b or saas is marketing
| itself as using hallucination free AI.
|
| We're waaaaayyyyyy past the early adoption stage, and the
| product hasn't meaningfully improved.
| coffeebeqn wrote:
| I also use it for plant care tips. What should I feed this
| plant and what kind of soil to use and all the questions I
| never bothered to Google and crawl through some long blog
| article on
| davidmurdoch wrote:
| Do you not use it to try learning new things? I use it to
| help get familiar with new software (recently for FreeCAD),
| or new concepts (passive speaker crossover design).
| josh2600 wrote:
| Wow, this is a wild opinion. I wonder how many people you've
| talked to about this?
|
| I know tons of people in my social groups that love AI and
| use it every day in it's current form.
| herval wrote:
| it's _extremely_ useful for lawyers, arguably even more so
| than for coders, given how much faster they can do stuff.
| They're also extremely useful for anyone who writes text and
| wants a reviewer. Also capable to execute most daily
| activities of some roles, such as TPMs.
|
| It's still useful to a small subset of all those professions
| - the early adopters. Same way computers were useful to many
| professionals before the UI (but only a small fraction of
| them had the skillset to use terminals)
| singleshot_ wrote:
| > it's _extremely_ useful for lawyers,
|
| How so? How are you using LLMs to practice law? Genuinely
| curious.
| herval wrote:
| multiple lawyer friends I know are using chatgpt (and
| custom gptees) for contract reviews. They upload some
| guidelines as knowledge, then upload any new contract for
| validation. Allegedly replaces hours of reading. This is
| a large portion of the work, in some cases. Some of them
| also use it to debate a contract, to see if there's
| anything they overlooked or to find loopholes. LLMs are
| extremely good at that kind of constrained creativity
| mode where they _have_ to produce something (they suck at
| saying "I dont know" or "no"), so I guess it works as
| sort of a "second brain" of sorts, for those too.
|
| There's even reported cases of entire legislations being
| written with LLMs already [1]. I'm sure there's thousands
| more we haven't heard about - the same way researchers
| are writing papers w/ LLMs w/o disclosing it
|
| [1] https://olhardigital.com.br/2023/12/05/pro/lei-
| escrita-pelo-...
| rsynnott wrote:
| Five years later, when the contract turns out to be
| defective, I doubt the clients are going to be _thrilled_
| with "well, no, I didn't read it, but I did feed it to a
| magic robot".
|
| Like, this is malpractice, surely?
| transcriptase wrote:
| It only has to be less likely to cause that issue than a
| paralegal to be a net positive.
|
| Some people expect AI to never make mistakes when doing
| jobs where people routinely make all kinds of mistakes of
| varying severity.
|
| It's the same as how people expect self-driving cars to
| be flawless when they think nothing of a pileup caused by
| a human watching a reel while behind the wheel.
| WhyOhWhyQ wrote:
| Any evidence it's actually better than a paralegal? I
| doubt it is.
| voltaireodactyl wrote:
| In the pileup example, the human driver is legally at
| fault. If a self driving car causes the pileup, who is at
| fault?
| qup wrote:
| Well, maybe its wheel fell off.
|
| So, the mechanic who maintenanced it last?
|
| ...
|
| We don't fault our tools, legally. We usually also don't
| fault the manufacturer, or the maintenance guy. We fault
| the people using them.
| herval wrote:
| My understanding is the firm operating the car is liable,
| in the full self driving case of commercial vehicles
| (waymo). The driver is liable in supervised self driving
| cases (privately owned Tesla)
| herval wrote:
| This is malpractice the same way that a coder using
| Copilot is malpractice
| tiahura wrote:
| Drafting demand letters, drafting petitions, drafting
| discovery requests, drafting discovery responses,
| drafting golden rule letters, summarizing meet and confer
| calls, drafting motions, responding to motions, drafting
| depo outlines, summarizing depos, ...
|
| If you're not using ai in your practice, you're doing a
| disservice to your clients.
| singleshot_ wrote:
| How do you get the LLM to the point where it can draft a
| demand letter? I guess I'm a little confused as to how
| the LLM is getting the particulars of the case in order
| to write a relevant letter. Are you typing all that stuff
| in as a prompt? Are you dumping all the case file
| documents in as prompts and summarizing them, and then
| dumping the summaries into the prompt?
| tiahura wrote:
| Demand letters are the easiest. Drag and drop police
| report and medical records. Tell it to draft a demand
| letter. For most things, there are only a handful
| critical pages in the medical records, so if the original
| pdf is too big, I'll trim excess pages. I may also add my
| personal case notes.
|
| I use a custom prompt to adjust the tone, but that's
| about it.
| herval wrote:
| curious about what tools you're using - is it just
| chatgpt? Any other apps/services/models?
| spzb wrote:
| Except for those lawyers who rely on it for case law eg
| https://law.justia.com/cases/federal/district-courts/new-
| yor...
| herval wrote:
| I think the big mistake is _blindly relying on the
| results_ - although that problem has been improving
| dramatically (gpt3.5 hallucinated constantly, I rarely
| see a hallucination w/ the latest gpt/claude models)
| y1n0 wrote:
| My wife uses it almost as much as me which isn't quite daily.
| She is not a coder whatsoever.
|
| I'll ask her what her use cases are and reply here later if I
| don't forget.
| umanwizard wrote:
| Walk into any random coffee shop in America where people are
| working on their laptops and you will see some subset of them
| on ChatGPT. It's definitely not just coders who are finding
| it useful.
| Ukv wrote:
| Particularly given the article's target is "systems based on
| large neural networks" and not specifically LLMs, I'd claim
| there are a vast number of uncontroversially beneficial
| applications: language translation, video transcription,
| material/product defect detection, weather forecasting/early
| warning systems, OCR, spam filtering, protein folding, tumor
| segmentation, drug discovery and interaction prediction, etc.
| stickfigure wrote:
| > convoluted expensive google search
|
| I'd call it a _working_ google search, unlike, you know,
| google these days.
|
| Actually google's LLM-based search results have been getting
| better, so maybe this isn't the end of the line for them. But
| for sophisticated questions (on noncoding topics!) I still
| always go to chatgpt or claude.
| coliveira wrote:
| > google's LLM-based search results have been getting
| better
|
| don't worry, Google WILL change this because they don't
| make money when people find the answer right away. They
| want people to see multiple ads before leaving the site.
| FrustratedMonky wrote:
| It's being used in drive through windows. In movies, in
| graphic design, pod casts, music, etc... 'entertainment'
| industry.
|
| And HN, it isn't just a few odd balls on HN championing it. I
| wish there was way to get a sentiment analysis of HN, it
| seems there are lot more people using it than not using it.
|
| And, what about the silent majority, the programmers that
| don't hang out on HN? I hear colleagues talk about it all the
| time.
|
| The impact is here, whether they are self directed or not, or
| whether there are still a few people not using it.
| sebastiansm wrote:
| Yesterday ChatGPT helped me to elaborate a skincare routine
| for my wife with multiple serums and creams that she received
| for Christmas. She and I had no idea when to apply, how to
| combine or when to avoid combination of some of those
| products.
|
| I could have google it myself in the evenings and had the
| answer in a few days of research, but with o1 in a 15min
| session my wife had had a solid weekly routine, the reasoning
| about those choices and academic papers with research about
| those products. (Obviously she knows a lot about skincare in
| general, so she had the capacity to recognize any wrong
| recommendation).
|
| Nothing game changer, but is great to save lots of time in
| this kind of tasks.
| exe34 wrote:
| Mention bleach and motor oil and see if it manages to
| exclude those!
| diego_sandoval wrote:
| If you think it won't exclude them 100% of the time, then
| you haven't used o1.
| irunmyownemail wrote:
| It's 2 days after Christmas, too early to know the impact
| of the purchases made based on what AI recommended, either
| positive or negative.
|
| If you're relying on AI to replace a human doctor trained
| in skin care or alternatively, your Google skills; please
| consider consulting an actual doctor.
|
| If she "knows a lot about skincare in general, so she had
| the capacity to recognize any wrong recommendation", then
| what did AI actually accomplish in the end.
| YeGoblynQueenne wrote:
| >> It's 2 days after Christmas, too early to know the
| impact of the purchases made based on what AI
| recommended, either positive or negative.
|
| No worries, I can tell you what to expect: nothing. No
| effect. Zilch. Nada. Zero. Those beauty creams are just a
| total scam and that's obvious from the fact they're
| targetted just as well to women who don't need them
| (young, good skin) as to ones who do (older, bad skin).
|
| About the only thing the beauty industry has figured out
| really works in the last five or six decades is
| Tretinoin, but you can use that on its own. Yet it's sold
| as one component in creams with a dozen others, that do
| nothing. Except make you spend money.
| YeGoblynQueenne wrote:
| Forgot to say: you can buy Tretinoin at the pharmacy,
| over the counter even depending on where you are. They
| sell it as a treatment for acne. It's also shown to
| reduce wrinkles in RCTs [1]. It's dirt cheap and you
| absolutely don't need to buy it as a beauty cream and pay
| ten times the price.
|
| _____________
|
| [1] _Topical tretinoin for treating photoaging: A
| systematic review of randomized controlled trials_ (2022)
|
| https://pmc.ncbi.nlm.nih.gov/articles/PMC9112391/
| drooby wrote:
| > convoluted expensive google search
|
| Interesting, I'm the opposite now. Why would I click a couple
| links to read a couple (verbose) blog posts when I can read a
| succinct LLM response. If I have low confidence in the
| quality of the response then I supplement with Google search.
|
| I feel near certain that I am saving time with this method.
| And the output is much more tuned to the context and framing
| of my question.
|
| Hah, take for example my last query in ChatGPT:
|
| > Are there any ancient technologies that when discovered
| furthered modern understanding of its field?
|
| ChatGPT gave some great responses, super fast. Google also
| provides some great results (though some miss the mark), but
| I would need to parse at least three different articles and
| condense the results.
|
| To be fair, ChatGPT gives some bad responses too.. But both
| an LLM and Google search should be used in conjunction to
| perform a search at the same time.
|
| Use LLMs as breadth-first search, and Google as depth-first
| search.
| paulcole wrote:
| > Every time I hear someone championing AI, it is a coder
|
| The argument I make is why aren't more people finding ways to
| code with AI?
|
| I work in a leadership role at a marketing agency and am a
| passable coder for scripts using Python and/or Google Apps
| Scripts. In the past year, I've built more useful and
| valuable tools with the help of AI than I had in the 3 or so
| years before.
|
| We're automating more boring stuff than ever before. It
| boggles my mind that everybody isn't doing this.
|
| In the past, I was limited by technical ability because my
| knowledge of our business and processes was very high. Now
| I'm finding that technical ability isn't my limitation, it's
| how well I can explain our processes to AI.
| mbesto wrote:
| Not a coder here (although I can code). I use LLMs 15+ time a
| day.
| tiahura wrote:
| I'm a lawyer and AI has become deeply integrated into my
| work.
| AnotherGoodName wrote:
| I'd argue that's just because coders are first to line up for
| this.
|
| There was a different thread on this site i read where a
| journalist used the wrong units of measurement (kilowatts
| instead of killowatt-hours for energy storage). You could
| paste the entire article into chatGPT with a prompt "spot
| mistakes in the following; [text]" and get an appropriate
| correction for this and similar mistakes the author made.
|
| As in there's journalists right now posting articles with
| clear mistakes that could have been proof read more
| accurately than they were if they were willing to use AI. The
| only excuse i could think of is resistance to change. A lot
| of professions right now could do their job better if they
| leant on the current generation of AI.
| oytis wrote:
| In my bubble coders find LLMs least useful. After all we
| already have all kinds of fancy autocomplete that works
| deterministically and doesn't hallucinate - and still not
| everyone uses it.
|
| When I use LLMs, I use it exactly as Google search on
| steroids. It's great for providing a summary on some unknown
| topic. It doesn't matter if it gets it wrong - the main value
| is in keywords and project names, and one can use the real
| Google search from there.
|
| And it isn't expensive if you are using the free version
| giraffe_lady wrote:
| These are different social contingents I think. At least for me
| I was super on board with wikipedia because as you say the use
| to me was immediate and certain. AI I have tried every few
| months for the last two years but I still haven't found a
| strong use for it. It has changed nothing for me personally
| except making some products I use worse.
| llm_trw wrote:
| Have you paid for it?
| giraffe_lady wrote:
| Yes my work pays for several of them. I don't particularly
| enjoy coding so believe me I have sincerely tried to get
| this to work.
| timcobb wrote:
| Cursor has been quite the jaw-dropping game changer for
| me for greenfield hobby dev.
|
| I don't know how useful it would be for my job, where I
| do maintenance on a pretty big app, and develop features
| on this pretty big app. But it could be great, I just
| don't know because work only allows Copilot. And Copilot
| is somewhere between annoying and novelty in my opinion.
| Loughla wrote:
| AI is only useful for me if I have a good idea of what the
| answer might already be, or at least what it absolutely can't
| be.
|
| It helps me get to an answer a little bit quicker, but it
| doesn't perform any absolutely groundbreaking work for me.
| xvector wrote:
| The Wikipedia analogy strikes true.
|
| Generally people are resistant to change and the average person
| will typically insist new technologies are pointless.
|
| Electricity and the airplane were supposed to be useless and
| dangerous dead ends according to the common person:
| https://pessimistsarchive.org/
|
| But we all like to think we have super unique opinions and
| personalities, so "this time it's different."
|
| When the change finally happens, people go about their lives as
| if they were right all along and the new technology is simply a
| mysterious and immutable fixture of reality that was always
| there.
| toddmorey wrote:
| I don't think that was the common person, nor do I think the
| common person today thinks AI will be useless.
| wat10000 wrote:
| People also thought the Segway was a useless dead end and
| they were right.
| timcobb wrote:
| Segway seems to have hardly been a dead end, or useless for
| that matter. Segway-style devices like the electric
| unicycle and many other light mobility devices seem to be
| direct descendants of the Segway. Segway introduced
| gyroscopes to the popular tech imagination, at least in my
| lifetime (not sure before).
| wat10000 wrote:
| What other light mobility devices? E-bikes and scooters
| seem to be the big things and they're not anything like a
| Segway descendant.
|
| A world where Segway never happened would be nearly
| indistinguishable from our own.
| Philpax wrote:
| https://en.wikipedia.org/wiki/Self-balancing_scooter
|
| Not the most popular, especially these days, but they are
| very much descended from Segways and have their own fans.
| marcosdumay wrote:
| Smartphones introduced gyroscopes to popular tech (and
| no, people were _imagining_ them before transistors),
| Segway had nothing to do with that.
| bpfrh wrote:
| There is a vast difference between arguments like "Phones
| have been accused of ruining romantic interaction and
| addicting us to mindless chatter" and "current AI has
| problems generating accurate information and can't replace
| researching things by hand for complicated or niche topics
| and there is reason to believe that the current architecture
| may not solve this problem"
|
| That aside optimist are also not always right, otherwise we
| would have cold fusion already and have a base on mars.
| rsynnott wrote:
| > But we all like to think we have super unique opinions and
| personalities, so "this time it's different."
|
| Are you suggesting that anything which is hyped is the
| future? Like, for every ten heavily-hyped things, _maybe_ one
| has some sort of post-hype existence.
| coliveira wrote:
| The pessimist is not wrong. In fact he's right more
| frequently than wrong. Just look at a long list of
| inventions. How many of them were so successful as the car or
| the airplane? Most of them were just passing fads that people
| don't even remember anymore. So if you're asking who is
| smarter, I would say the pessimist is closer to the truth,
| but the optimist who believed in something that really became
| successful is now remembered by everyone.
| Ukv wrote:
| I feel your argument relies on assuming that being an
| optimist or pessimist means believing 100% or 0%, whereas
| I'd claim it's instead more just having a relative leaning
| in a direction. Say after inspecting some rusty old engines
| a pessimist predicts 1/10 will still function and an
| optimist predicts 4/10 will function. If the engines do
| better than expected and 3/10 function, the optimist was
| closer to the truth despite most not working.
|
| Similarly, being optimistic doesn't mean you have to
| believe every single early-stage invention will work out no
| matter how unpromising - I've been enthusiastic about deep
| learning for the past decade (for its successes in language
| translation, audio transcription, material/product defect
| detection, weather forecasting/early warning systems, OCR,
| spam filtering, protein folding, tumor segmentation, spam
| filtering, drug discovery and interaction prediction, etc.)
| but never saw the appeal of NFTs.
|
| Additionally worth considering that the cost of trying
| something is often lower than the reward of it working out.
| Even if you were wrong 80% of the time about where to dig
| for gold, that 20% may well be worth it; reducing merely
| the _frequency_ of errors is often not logically correct.
| It 's useful in a society to have people believe in and
| push forward certain inventions and lines of research even
| if most do not work out.
|
| I think xvector's point is about people rehashing the same
| denunciations that failed to matter for previous successful
| technologies - the idea that something is useless because
| it's not (or perhaps will never be) 100.0% accurate, or the
| "Until it can do dishes, home computer remains of little
| value to families"[0] which I've seen pretty much ad
| verbatim for AI many times (extra silly now that we have
| dishwashers).
|
| Given in real life things have generally improved (standard
| of living, etc.), I think it has typically been more
| correct to be optimistic, and hopefully will be into the
| future.
|
| [0]: https://pessimistsarchive.org/clippings/34991885.jpg
| jdbernard wrote:
| This argument is very prone to survivorship bias. Of course,
| when we think back to the hyped technologies of the past we
| are going to remember mostly those that justified the hype.
| The failures get forgotten. The memory of social discourse
| fades extremely quickly, much faster than, for example, pop
| culture or entertainment.
| i_love_retros wrote:
| > It's enabled two specific applications I've built which
| wouldn't be possible at single-person scale.
|
| I'd love to hear more about how you utilised AI for this.
|
| Personally I'm struggling to find it more useful than a
| slightly fancy code completion tool
| broast wrote:
| > slightly fancy code completion tool
|
| Does this alone not increase your productivity exponentially?
| It does mine. I personally read code faster than I write it
| so it is an undeniable boon.
| i_love_retros wrote:
| I've found it depends on the context (pardon the pun)
|
| For example, personal projects that are small and where
| copilot has access to all the context it needs to make a
| suggestion - such as a script or small game - it has been
| really useful.
|
| But in a real world large project for my day job, where it
| would need access to almost the entire code base to make
| any kind of useful suggestion that could help me build a
| feature, it's useless! And I'd argue this is when I need
| it.
| wenc wrote:
| At present, LLMs work well with smaller chunks of code at
| time.
|
| Check out these tips for using Aider (a CLI based LLM
| code assisntant) https://aider.chat/docs/usage/tips.html
|
| It can ingest the entire codebase (up to its context
| length), but for some reason, I've always had much higher
| quality chats with smaller bite-sized pieces of code.
| jprete wrote:
| Autocomplete distracts me enough that it really needs to be
| close to 100% correct before it's useful. Otherwise it's
| just wrecking my flow and slowing me down.
| coffeebeqn wrote:
| Exponentially? Absolutely not. In the best case it creates
| something that's almost useful. Are you working on large
| actual codebases or talking about some one off toy apps?
| agos wrote:
| it's surely a boon, but does not match the hype
| wbazant wrote:
| You could try aider, or another tool/workflow where you
| provide whole files and ask for how they should be changed -
| very different from code completion type tools!
| hbn wrote:
| Anyone who says AI is useless never had to do the old method of
| cobbling together git and ffmpeg commands from StackOverflow
| answers.
|
| I have no interest in learning the horrible unintuitive UX of
| every CLI I interact with, I'd much rather just describe in
| English what I want and have the computer figure it out for me.
| It has practically never failed me, and if it does I'll know
| right away and I can fall back to the old method of doing it
| manually. For now it's saving me so much time with menial,
| time-wasting day-to-day tasks.
| jghn wrote:
| I had a debate recently with a colleague who is very
| skeptical of LLMs for every day work. Why not lean in on
| searching Google and cross referencing answers, like we've
| done for ages? And that's fine.
|
| But my counterargument is that what I find to be so powerful
| about the LLMs is the ability to refine my question, narrow
| in on a tangent and then pull back out, etc. And *then* I can
| take its final outcome and cross reference it. With the old
| way of doing things, I often felt like I was stumbling in the
| dark trying to find the right search string. Instead I can
| use the LLM to do the heavy lifting for me in that regard.
| ADeerAppeared wrote:
| > Anyone who says AI is useless
|
| Most of those people are a bit bad at making their case. What
| they mean but don't convey well is that AI is useless _for it
| 's proclaimed uses_.
|
| You are correct that LLMs are pretty good at guessing this
| kind of well-documented & easily verifiable but hard to find
| information. That is a valid use. (Though, woe betide the
| fool who uses LLMs for irreversible destructive actions.)
|
| The thing is though, this isn't enough. There just aren't
| that many questions out there that match those criteria.
| Generative AI is too expensive to serve that small a task.
| Charging a buck a question won't earn the $100 billion OpenAI
| needs to balance the books.
|
| Your use case gets dismissed because on it's own, it doesn't
| sustain AI.
| wenc wrote:
| I think you're on to something. I find the sentiment around
| LLMs (which is at the early adoption stage) to be
| unnecessarily hostile. (beyond normal HN skepticism)
|
| But it can be simultaneously true that LLMs add a lot of
| value to some tasks and less to others --- and less to some
| people. It's a bit tautological, but in order to benefit
| from LLMs, you have to be in a context where you stand to
| most benefit from LLMs. These are people who need to
| generate ideas, are expert enough to spot consequential
| mistakes, know when to use LLMs and when not to. They have
| to be in a domain where the occasional mistake generated
| costs less than the new ideas generated, so they still come
| out ahead. It's a bit paradoxical.
|
| LLMs are good for: (1) bite-sized chunks of code; (2)
| ideating; (3) writing once-off code in tedious syntax that
| I don't really care to learn (like making complex plots in
| seaborn or matplotllib); (4) adding docstrings and
| documentation to code; (5) figuring out console error
| messages, with suggestions as to causes (I've debugged a
| ton of errors this way -- and have arrived at the answer
| faster than wading through Stackoverflow); (6) figuring out
| what algorithm to use in a particular situation; etc.
|
| They're not yet good at: (1) understanding complex
| codebases in their entirety (this is one of the
| overpromises; even Aider Chat's docs tell you not to ingest
| the whole codebase); (2) any kind of fully automated task
| that needs to be 100% deterministic and correct (they're
| assistants); (3) getting math reasoning 100% correct (but
| they can still open up new avenues for exploration that
| you've never even thought about);
|
| It takes practice to know what LLMs are good at and what
| they're not. If the initial stance is negativity rather
| than a growth mindset, then that practice never comes.
|
| But it's ok. The rest of us will keep on using LLMs and
| move on.
| esafak wrote:
| An example that might be of interest to readers: I gave
| it two logs, one failing and one successful, and asked it
| to troubleshoot. It turned out a loosely pinned
| dependency (Docker image) had updated in the failing one.
| An error mode I was familiar with and could have solved
| on my own, but the LLM saved me time. They are reliable
| at sifting through text.
| mistercheph wrote:
| Hostility and a few swift kicks are in order when the
| butt scratchers start saying their stochastic parrot
| machine is intelligent and a superman.
| Loughla wrote:
| I've been sold AI as if it can do anything. It's being
| actively sold like a super intelligent independent human
| that never needs breaks.
|
| And it just isn't that thing. Or, rather, it is super
| intelligent but lacks any wisdom at all; thus rendering
| it useless for how it's being sold to me.
|
| >which is at the early adoption stage
|
| I've said this in other places here. LLM's simply aren't
| at early adoption stage anymore. They're being packaged
| into literally every saas you can buy. They're a main
| selling point for things like website builders and other
| direct to business software platforms.
| wenc wrote:
| Why not ignore the hype, and just quietly use what works?
|
| I don't use anything other than ChatGPT 4o and Claude
| Sonnet 3.5v2. That's it. I've derived great value from
| just these two.
|
| I even get wisdom from them too. I use them to analyze
| news, geopolitics, arguments around power structures,
| urban planning issues, privatization pros and cons, and
| Claude especially is able to give me the lay of the land
| which I am usually able to follow up on. This use case is
| more of the "better Google" variety rather than task-
| completion, and it does pretty well for the most part.
| Unlike ChatGPT, Claude will even push back when I make
| factually incorrect assertions. It will say "Let me
| correct you on that...". Which I appreciate.
|
| As long as I keep my critical thinking hat on, I am able
| to make good use of the lines of inquiry that they
| produce.
|
| Same caveat applies even to human-produced content. I
| read the NYTimes and I know that it's wrong a lot, so I
| have to trust but verify.
| Loughla wrote:
| I agree with you, but it's just simply not how these
| things are being sold and marketed. We're being told we
| do not have to verify. The AI knows all. It's
| undetectable. It's smarter and faster than you.
|
| And it's just not.
|
| We made a scavenger hunt full of puzzles and riddles for
| our neighbor's kids to find their Christmas gifts from us
| (we don't have kids at home anymore, so they fill that
| niche and are glad to because we go ballistic at
| Christmas and birthdays). The youngest of the group is
| the tech kid.
|
| He thought he fixed us when he realized he could use
| chatgpt to solve the riddles and cyphers. It recognized
| the Caesar letter shift to negative 3, but then made up a
| random phrase with words the same length to solve it. So
| the process was right, but the outcome was just
| outlandishly incorrect. It wasted about a half hour of
| his day. . .
|
| Now apply that to complex systems or just a simple large
| database, hell, even just a spreadsheet. You check the
| process, and it's correct. You don't know the outcome, so
| you can't verify unless you do it yourself. So what's the
| point?
|
| For context, I absolutely use LLM's for things that I
| know roughly, but don't want to spend the time to do.
| They're useful for that.
|
| They're simply not useful for how they're being marketed,
| which is too solve problems you don't already know.
| hyhconito wrote:
| You're still doing it the hard way. I just use Handbrake.
|
| Pick a hammer, not a shitty hammer factory to assemble bits
| of hammer.
| sunnybeetroot wrote:
| How do you use handbrake to write a script that uses
| ffmpeg?
| arkh wrote:
| > if it does I'll know right away and I can fall back to the
| old method of doing it manually
|
| It's well and ok with things you can botch with no
| consequence other than some time wasted. But I've bricked
| enough VMs trying commands I did not understand to know that
| if you need to not fuck up something you'll have to read
| those docs and understand them. And hope they're not out of
| date / wrong.
| dangoodmanUT wrote:
| > Anyone who says AI is useless never had to do the old
| method of cobbling together git and ffmpeg commands from
| StackOverflow answers.
|
| The best ffmpeg and regex command generators
| jv981 wrote:
| try asking an LLM how to add compressed data to a tar through
| standard input, see how that goes (don't forget to check the
| answer :)
| cootsnuck wrote:
| I use LLMs to help me with ffmpeg commands more than I care
| to admit
| momentoftop wrote:
| > Anyone who says AI is useless never had to do the old
| method of cobbling together git and ffmpeg commands from
| StackOverflow answers.
|
| It's useful for that yes, but I'd rather just live in a world
| where we didn't have such disasters of CLI that are git and
| ffmpeg.
|
| LLMs are very useful for generating the obscure boilerplate
| needed because the underlying design is horrible. Relying on
| it means acquiescing to those terrible designs rather than
| figuring out redesigns that don't need the LLMs. For
| comparison, IntelliJ is very good at automating all the
| boilerplate generation that Java imposes on me, but I'd
| rather we didn't have boilerplate languages like Java, and
| I'd rather that IntelliJ's boilerplate generation didn't
| exist.
|
| I fear in many cases that if an LLM is solving your problem,
| you are solving the wrong problem.
| wruza wrote:
| We can't test/review these apps though, can we?
|
| I'm asking not for snark, but because when AI gives me
| something not _quite_ working, it requires much more time than
| what a "every 6 minutes in 10 hour work day" frame would allow
| to investigate. I just wonder if _maybe_ you 're pasting it as
| is and don't care about correctness if the happy path sort of
| works. Speaking of subsets, coders who did that before AI were
| also quite a group.
|
| There must be _something_ that explains the difference in our
| experiences. Apologies for the fact that my only idea is kinda
| negative. I understand the potential hyperbola here, but it
| doesn 't explain much. I can stand AI BS once a day, maybe
| twice, before uncontrollably cursing into the chat.
| thomashop wrote:
| Why not write tests with AI, too? Since using LLMs as coding
| assistants, my codebases have much more thorough
| documentation, testing and code coverage.
|
| Don't start when you're already in a buggy dead-end. Test-
| driven development with LLMs should be done right from the
| start.
|
| Also keep the code modular so it is easy to include the
| correct context. Fine-grained git commits. Feature-branches.
|
| All the tools that help teams of humans of varying levels of
| expertise work together.
| croes wrote:
| Becaus then you need tests for the tests
| thomashop wrote:
| Sure. You can always write more tests. That's not a
| problem specific to AI.
|
| I'd also do code reviews on the code AI produces.
| mistercheph wrote:
| You may have enough expertise in your field that when you
| have a question, you know where to start looking. Juniors and
| students encounter dozens of problems and question per hour
| that fall into the unknown unknown category
| croes wrote:
| Are you still a coder when you use AI 100 times a day?
|
| AI is a type of outsourcing, you became a customer.
| mbernstein wrote:
| Not outsourcing at all - you're are an engineer using the
| tools that make sense to solve a problem. The core issue with
| identifying as just a coder is that code is just one of many
| potential tools to solve a problem.
| croes wrote:
| Could you distinguish code written by an AI from code
| written by a fake AI that is actually a human being?
|
| Something/someone other writes the code, that's
| outsourcing.
|
| I wouldn't consider myself an artist if I create a picture
| per midjourney.
| arkh wrote:
| Do you write binary code or use a compiler?
|
| Do you design all the NAND gates in your processor to get
| the exact program you want out of it or use a general
| purpose processor?
|
| Current "coding" is just a detail of what you want to do:
| solve problems. Which can require making a machine do
| what you want it to.
| croes wrote:
| So your customer/employer is a coder too. They want solve
| a problem and use a tool: You.
|
| A coder writes code in a programming language, that what
| distinguishes them from the customers who use natural
| language. The coder is the translator between the
| customer and the machine. If the machine does that, the
| machine is the coder.
| mbernstein wrote:
| Is your customer bringing you the solution to the problem
| or the problem and asking you to solve the problem? One
| is a translation activity and the other isn't.
| nlh wrote:
| If you're sitting in front of the keyboard, inputting
| instructions and running the resulting programs, yes you are
| still a coder. You're just move another layer up on the
| stack.
|
| The same type of argument has been made for decades -- when
| coders wrote in ASM, folks would ask "are you still a coder
| when you use that fancy C to make all that low-level ASM
| obsolete?". Etc etc.
| croes wrote:
| So if I sit in front of the keyboard and write an email
| with instructions to my programmer I'm a coder.
| owenpalmer wrote:
| Are you still a coder when you use libraries or frameworks?
| You didn't write the code yourself, you're just outsourcing
| it.
| stronglikedan wrote:
| Have you tried a few? If so, which do you prefer? If not, which
| do you use? I'm a little late to the party, and the current
| amount of choices is quite intimidating.
| airstrike wrote:
| I imagine you're asking about coding help. For that, I think
| you should qualify any answer you get with the user's most
| commonly used language (and framework, if applicable).
|
| In my experience, Claude Sonnet 3.5 (3.6?) has been
| unbeatable. I use it for Rust. Making sense of compiler
| errors, rubberducking, finding more efficient ways to write
| some function and, truth be told, some times just plain old
| debugging. More than once, I've been able to dump a massive
| module onto the chat context and say "look, I'm experiencing
| this weird behavior but it's really hard to pin down what's
| causing it in this code" and it pointed to the _exact_ issue
| in a second. That alone is worth the price of admission.
|
| Way better than ChatGPT 4o and o-1, in my experience, despite
| me saying the exact opposite a few months ago.
| esafak wrote:
| Try Cody. It integrates with your IDE, understands your code
| base, and lets you pick the LLM.
| monkeynotes wrote:
| This isn't about if LLMs are useful, it's about how useful can
| they become. We are trying to understand if there is a path
| forward to transformative tech, or are we just limited to a
| very useful tool.
|
| It's a valid conversation after ~3 years of anticipating the
| world to be disrupted by this tech. So far it has not
| delivered.
|
| Wikipedia did not change the world either, it's just a great
| tool that I use all the time
|
| As for software, it performs ok. I give up on it most of the
| time if I am trying to write a whole application. You have to
| acquire a new skill, prompt engineering, and feverish
| iteration. It's a frustrating game of whack-a-mole and I find
| it quicker to write the code myself and just have the LLM help
| me with architecture ideas, bug bashing, and it's also quite
| good at writing tests.
|
| I'd rather know the code intimately so I can more quickly debug
| it than have an LLM write it and just trust it did it well.
| wenc wrote:
| By the way, Wikipedia did change the world. Some of the most
| important inventions are the ones we don't notice.
| nemo44x wrote:
| Peter Thiel talked about this years ago in his book From 0 to
| One. His key insight, which we're seeing today, is that AI
| tools will work side-by-side with people and enhance their
| productivity to levels never imagined. From helping with some
| basic tasks ("write an Excel script that transforms this table
| from this format to this new format") to helping write
| programs, it's a tool that aids humans in getting more things
| done than previously possible.
| paxys wrote:
| Every piece of technology is a "dead end" until something better
| replaces it. That doesn't mean it can't be useful or
| revolutionary.
| tu7001 wrote:
| We all know what's an answer when there is a question mark in the
| title.
| IanHalbwachs wrote:
| https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headline...
| btbuildem wrote:
| Interesting read -- and a correct take, given the software
| development perspective. In that context, LLM-based AI is faulty,
| unpredictable, and unmanageable, and not ready for mission-
| critical applications.
|
| If you want to argue otherwise, do a quick thought experiment
| first: would you let an LLM manage your financial affairs
| (entirely, unsupervised)? Would you let it perform your job while
| you receive the rewards and consequences? Would you be
| comfortable to give it full control of your smart home?
|
| There are different sets of expectations put on human actors vs
| autonomous systems. We expect people to be fallible and wrong
| some of the time, even if the individuals in question can't/won't
| admit it. With a software-based system, the expectations are that
| it will be robust, tested, and performing correctly 100% of the
| time, and when a fault occurs, it will be clear, marked with
| yellow tape and flashing lights.
|
| LLM-based AIs are sort of insidious in that they straddle this
| expectation gap: the emergent behaviour is erratic, projecting
| confident omniscience, while often hallucinating and plain wrong.
| However vague, the catch-all term "AI" still implies "computer
| system" and by extension "engineered and tested".
| cesaref wrote:
| It's a bad example. Lots of finance firms use AI to manage
| their financial affairs - go and investigate what is currently
| considered state of the art for trading algorithms.
|
| Now if you substituted something safety critical instead, say,
| running a nuclear power station, or my favourite currently in
| use example, self driving cars, then yes, you should be scared.
| galleywest200 wrote:
| > go and investigate what is currently considered state of
| the art for trading algorithms.
|
| These are not LLMs but algorithms written and designed by
| human minds. It is unfortunate that AI has become a catch-all
| word for any kind of machine learning.
| chuckadams wrote:
| LLMs are algorithms written by humans. "AI" is _supposed_
| to be a vague term, and not synonymous with one particular
| implementation.
| teucris wrote:
| LLMs are _architectures_ written by humans. What an LLM
| creates is not algorithmic.
| FrustratedMonky wrote:
| "What an LLM creates is not algorithmic."
|
| Not strictly true. There are patterns in the weights that
| could be steps in an algorithm.
| seadan83 wrote:
| LLMs create models, not algorithms. An algorithm is a
| rote sequence if steps to accomplish a task.
|
| The following is an algorithm:
|
| - plug in input to model
|
| - say yes if result is positive, else say no
|
| LLMs use models, the model is not an algorithm.
|
| > There are patterns in the weights that could be steps
| in an algorithm.
|
| Sure, but yeah... no.. "Could be steps in an algorithm"
| does not constitute an algorithm.
|
| Weights are inputs, they are not themselves parts of an
| algorithm. The algorithm might still try to come up with
| weights. Still, don't confuse procedure from data.
| FrustratedMonky wrote:
| Don't want to get to pedantic on that response. The model
| can contain complex information. There is already
| evidence it can form a model of the world. So why not
| something like steps to get from A to B.
|
| And, it is clear that LLMs can follow steps. One didn't
| place in the Math Olympiad without some ability to follow
| steps.
|
| https://research.google/blog/teaching-language-models-to-
| rea...
|
| And, Anyway, when I asked it, it said it could
|
| "Yes, an LLM model can contain the steps of an algorithm,
| especially when prompted to "think step-by-step" or use a
| "chain-of-thought" approach, which allows it to break
| down a complex problem into smaller, more manageable
| steps and generate a solution by outlining each stage of
| the process in a logical sequence; essentially mimicking
| how a human would approach an algorithm. "
| wavemode wrote:
| He said LLM, not just any AI
| aleph_minus_one wrote:
| I would bet that there _do_ exist some finance firms that
| _do_ use LLM as AIs for the purposes that cesaref sketched.
| ryukoposting wrote:
| Makes me wonder how they detect market manipulation and
| fraud. Trivial activities, like marking the close, probably
| aren't hard to detect, but I imagine that some kind of ML
| thingy is involved in flagging accounts for manual
| inspection.
| msabalau wrote:
| Pragmatically, "AI" will mean (and for, many people already
| does mean) stochastic and fallible.
|
| If your users are likely to be AI illiterate and mistakenly
| feel that an AI app is reliable and suitable for mission
| critical applications when it isn't, that is a risk you
| mitigate.
|
| But it seems deeply unserious of the author to just assert that
| mission-critical software is the only "serious context" and the
| only thing matters, and therefore AI is dead end. "Serious,
| mission critical" apps are just going to be a niche in the
| future.
| timeon wrote:
| > "Serious, mission critical" apps are just going to be a
| niche in the future.
|
| High quality only for _niche_ "serious, mission critical"
| everywhere else, enshittification already started, LLMs will
| just accelerate it.
| JumpCrisscross wrote:
| > _LLM-based AI is faulty, unpredictable, and unmanageable_
|
| Is there a fundamental (a La Godel) reason why we can't predict
| or manage LLMs?
|
| > _would you let an LLM manage your financial affairs
| (entirely, unsupervised)?_
|
| No. But history is littered with eccentric geniuses who
| couldn't be trusted on their own, but who nevertheless were
| relied on by decision makers.
|
| Maybe there is an Erdos principle at play: AI can be involved
| in questions of arbitrary complexity, but can only advise on
| important decisions.
| manquer wrote:
| > would you let an LLM manage your financial affairs (entirely,
| unsupervised)?
|
| It will likely be better[2] not because AI is good at this .
|
| It would be because study after study[1] has shown that active
| management performs poorer than passive funds, less
| intervention gives better result over longer timeframe .
|
| [1] the famous warren buffet bet comes to mind . There are more
| formal ones validating this .
|
| [2] if configured to do minimal changes
| seadan83 wrote:
| What if financial affairs were broadened to be everything,
| not just portfolio management? Eg: paying bills, credit
| cards, cash balance in check vs savings vs brokerage.
| manquer wrote:
| Good financial management(portfolio and personal) is a
| matter of disciplined routine, performed consistently over
| long timeframe, combined with impulse control. It is not
| complicated at all, any program (LLM or just a rules
| engine) will always do far better than we can because it
| will not suffer either problem(sticking to the routine or
| impulse).
|
| Most humans make very bad decisions around personal
| finance, whether it is big things like gambling or impulse
| buys with expensive credit, to smaller items like tracking
| subscriptions or keeping not needed money in checking
| account etc.
|
| This is irrespective of financial literacy, education,
| wealth or professions like say working in finance/ personal
| wealth management even.
|
| Entire industries like lottery, gambling, luxury goods,
| gaming, credit card APRs, Buy Now Pay Later, Consumer SaaS,
| Banking overdraft fees are all built around our inability
| to control our impulses or follow disciplined routines.
|
| This is why trust funds with wealth management
| professionals are the only way to generational wealth.
|
| You need the ability to control any benefactor (the next
| generations) from excising their impulses on amounts beyond
| their annual draw. Plus the disciplined routine of a
| professional team who are paid to do only this with
| multiple layers that vet the impulses of individual
| managers and conservative mandate to keep them risk averse
| and therefore less impulsive.
|
| If an program can do it for me (provided of course I
| irrevocably give away my control to override or alter its
| decisions) then normal people can also benefit without the
| high net worth required for wealth management.
| clint wrote:
| The primary fallacy in your argument is that you seem to think
| that humans produce much better products on some kind of
| metric.
|
| My lived experience the software industry at almost all levels
| over the last 25 years leads me to believe that the vast
| majority of humans and teams of humans produce atrocious code
| that only wastes time, money, and people's patience.
|
| Often because it is humans producing the code, other humans are
| not willing to fully engage, criticize and improve that code,
| deferring to just passing it on to the next person, team,
| generation, whatever.
|
| Yes, this perhaps happens better in some (very large and very
| small) organizations, but most often it only happens with the
| inclusions of horrendous layers of protocol, bureaucracy, more
| time, more emotional exhaustion, etc.
|
| In other words a very costly process to produce excellent code,
| both in real capital and human capital. It literally burns
| through actual humans and results in very bad health outcomes
| for most people in the industry, ranging from minor stuff to
| really major things.
|
| The reality is that probably 80% of people working in the tech
| industry can be outperformed by an AI and at a fraction of the
| cost. AIs can be tuned, guided, and steered to produce code
| that I would call exception compared even to most developers
| who have been in the field for 5years or more.
|
| You probably come to this fallacy because you have worked in
| one of these very small or very large companies that takes
| producing code seriously and believe that your experience
| represents the vast majority of the industry, but in fact the
| middle area is where most code is being "produced" and if
| you've never been fully engaged in those situations, you may
| literally have no idea of the crap that's being produced and
| shipped on a daily basis. These companies have no incentive to
| change, they make lots of money doing this, and fresh meat
| (humans) is relatively easy to come by.
|
| Most of these AI benchmarks are trying to get these LLMs to
| produce outputs at the scale and quantity of one of these
| exceptional organizations when in fact, the real benefits will
| come in the bulk of organizations that cannot do this stuff and
| AI will produce as good or better code than a team of mediocre
| developers slogging away in a mediocre, but profitable,
| company.
|
| Yes there are higher levels of abstraction around code, and
| getting it deployed, comprehensive testing, triaging issues, QA
| blah blah, that humans are going to be better at for now, but I
| see many of those issues being addressed by some kind of LLM
| system sooner or later.
|
| Finally, I think most of the friction people are seeing right
| now in their organization is because of the wildly ad hoc way
| people and organizations are using AI, not so much about the
| technological abilities of the models themselves.
| d0mine wrote:
| "80%" "outperformed" "fraction of the cost" you could make a
| lot of money if it were true but 5x productivity boost seems
| unjustified right now, I'm having a hard time finding
| problems where the output is even 1x (where I don't spend
| more time babysitting LLM than doing the task from scratch
| myself).
| Earw0rm wrote:
| Depends what you're doing.
|
| For "stay in your lane" stuff, I agree, it relatively
| sucks.
|
| For "today I need do stuff two lanes over", well it still
| needs the babysitting, and I still wouldn't put it on tasks
| where I can't verify the output, but it definitely delivers
| a productivity boost IME.
| SoftTalker wrote:
| Sorry you're downvoted, but I generally agree. When it comes
| to software, most organizations are Initech.
| superjan wrote:
| With respect to hallucinating, I never read about training
| LLM's to say: "I don't know" when they don't know. Is that even
| researched?
| Sohcahtoa82 wrote:
| ChatGPT seems to be good about this. If you invent something
| and ask about it, like "What was the No More Clowning Act of
| 2025?", it will say it can't find any information on it.
|
| The older or smaller models, like anything you can run
| locally, are probably far more likely to just invent some
| bullshit.
|
| That said, I've certainly asked ChatGPT about things that
| definitely have a correct answer and had it give me incorrect
| information.
|
| When talking about hallucinating, I do think we need to
| differentiate between "what you asked about exists and has a
| correct answer, but the AI got it wrong" and "What you're
| asking for does not exist or does not have an answer, but the
| AI just generated some bullshit".
| sroussey wrote:
| Not sure why you are downvoted. It's a difficult problem, but
| lots of angles on how to deal with it.
|
| For example: https://arxiv.org/abs/2412.15176
| rsanek wrote:
| > Would you let it perform your job while you receive the
| rewards and consequences?
|
| isn't this what being a human manager is? not sure why you're
| saying it must be entirely + unsupervised. at my job, my boss
| mostly trusts me but still checks my work and gives me feedback
| when he wants something changed. he's ultimately responsible
| for what I do.
| bredren wrote:
| Indeed, even if you have a professional accountant do your
| taxes, you must still sign off on their work.
|
| Detecting omissions or errors on prepared tax forms often
| requires knowledge of context missed by or not provided to
| the accountant.
| PaulDavisThe1st wrote:
| I believe you're asking the wrong question, or at least you're
| asking it in the wrong way. From my POV, it comes in two parts:
|
| 1. Do you believe that LLMs operate in a similar way to the
| important parts of human cognition?
|
| 2. If not, do you believe that they operate in a way that makes
| them useful for tasks other than responding to text prompts,
| and if so, what are those tasks?
|
| If you believe that the answer to Q1 is substantively "yes" -
| that is, humans and LLM are engaged in the same sort of
| computational behavior when we engage in speech generation -
| then there's presumably no particular impediment to using an
| LLM where you might otherwise use a human (and with the same
| caveats).
|
| My own answer is that while some human speech behavior is
| possibly generated by systems that function in a semantically
| equivalent way to current LLMs, human cognition is capable of
| tasks that LLMs cannot perform de novo even if they can give
| the illusion of doing so (primarily causal chain reasoning).
| Consequently, LLMs are not in any real sense equivalent to a
| human being, and using them as such is a mistake.
| User23 wrote:
| I think C.S. Peirce's distinction between corollarial
| reasoning and theorematic reasoning[1][2] is helpful here. In
| short, the former is the grindy rule following sort of
| reasoning, and the latter is the kind of reasoning that's
| associated with new insights that are not determined by the
| premises alone.
|
| As an aside, Students of Peirce over the years have quite the
| pedigree in data science too, including the genius Edgar F.
| Codd, who invented the relational database largely inspired
| by Peirce's approach to relations.
|
| Anyhow, computers are already quite good at corollarial
| reasoning and have been for some time, even before LLMs. On
| the other hand, they struggle with theorematic reasoning.
| Last I knew, the absolute state of the art performs about as
| well as a smart high school student. And even there, the
| tests are synthetic, so how theorematic they truly are is
| questionable. I wouldn't rule out the possibility of some
| automaton proposing a better explanation for gravitational
| anomalies than dark matter for example, but so far as I know
| nothing like that is being done yet.
|
| There's also the interesting question of whether or not an
| LLM that produces a sequence of tokens that induces a genuine
| insight in the human reader actually means the LLM itself had
| said insight.
|
| [1] https://www.cspeirce.com/menu/library/bycsp/l75/ver1/l75v
| 1-0...
|
| [2] https://groups.google.com/g/cybcom/c/Es8Bh0U2Vcg
| Closi wrote:
| > My own answer is that while some human speech behavior is
| possibly generated by systems that function in a semantically
| equivalent way to current LLMs, human cognition is capable of
| tasks that LLMs cannot perform de novo even if they can give
| the illusion of doing so (primarily causal chain reasoning).
| Consequently, LLMs are not in any real sense equivalent to a
| human being, and using them as such is a mistake.
|
| In the workplace, humans are ultimately a tool to achieve a
| goal. LLM's don't have to be equivalent to humans to replace
| a human - they just have to be able to achieve the goal that
| the human has. 'Human' cognition likely isn't required for a
| huge amount of the work humans do. Heck, AI probably isn't
| required to automate a lot of the work that humans do, but it
| will accelerate how much can be automated and reduce the cost
| of automation.
|
| So it depends what we mean as 'use them as a human being' -
| we are using human beings to do tasks, be it solving a
| billing dispute for a customer, processing a customers
| insurance claim, or reading through legal discovery. These
| aren't intrinsically 'human' tasks.
|
| So 2 - yes, I do believe that they operate in a way that
| makes them useful for tasks. LLM's just respond to text
| prompts, but those text prompts can do useful things that
| humans are currently doing.
| RaftPeople wrote:
| My 2 cents:
|
| I think the vector representation stuff is an effective tool
| and possibly similar to foundational tools that humans are
| using.
|
| But my gut feel is that it's just one tool of many that
| combine to give humans a model+view of the world with some
| level of visibility into the "correctness" of ideas about
| that world.
|
| Meaning we have a sense of whether new info "adds up" or not,
| and we may reject the info or adjust our model.
|
| I think LLM's in their current state can be useful for tasks
| that do not have a high cost resulting from incorrect output,
| or tasks that can have their output validated by humans or
| some other system cost-effectively.
| tliltocatl wrote:
| IMHO, a more important and testable difference is that humans
| don't have separate "train" and "infer" phases. We are able
| to adapt more or less on the fly and learn from previous
| experience. LLMs currently cannot retain any novel experience
| past the context window.
| AnimalMuppet wrote:
| I think LLMs operate in a similar way to _some_ of the
| important parts of human congnition.
|
| I believe they operate in a way that makes them at least
| somewhat useful for some things. But I think the big issue is
| trustworthiness. Humans - at least some of them - are more
| trustworthy than LLM-style AIs (at least current ones). LLMs
| need progress on trustworthiness more than they need progress
| on use in other areas.
| qaq wrote:
| Would you let an LLM manage your financial affairs (entirely,
| unsupervised)?
|
| Hmm I would not let other human manage my financial affairs
| entirely unsupervised.
| tossandthrow wrote:
| > would you let an LLM manage your financial affairs (entirely,
| unsupervised)?
|
| No, but I also would not let another person do that.
|
| It is telling that you needed to interject "entirely,
| unsupervised".
|
| Most people will let an llm do it partially, and probably
| already do.
| BhavdeepSethi wrote:
| People pay to use a financial advisor. Isn't that another
| person?
| tossandthrow wrote:
| The key is: entirely and unsupervised.
|
| Mostly your financial advisor writes your return you sign
| off on or manages your portfolio. But the advisor usually
| solicits and interacts with you to know what your financial
| goals are and ensure you are on board with the consequences
| of their advice.
|
| I do not dismiss that some people are completely hands off
| at great risk IMHO. But these are not me - as was my
| initial proposition.
| wyager wrote:
| > would you let an LLM manage your financial affairs (entirely,
| unsupervised)?
|
| I wouldn't let another human do this.
| Earw0rm wrote:
| A more informative question is:
|
| _Who_ would you let manage your financial affairs, and under
| what circumstances?
|
| To which my answer would be something like: a qualified
| financial adviser with a good track record, who can be trusted
| to do the job to, if not the best of their abilities, at least
| an acceptable level of professional competence.
|
| A related question: who would you let give you a lift someplace
| in a car?
|
| And here's where things get interesting. Because on the one
| hand there's a LOT more at stake (literally, your life), and
| yet various social norms, conventions , economic pressures and
| so on mean that in practice we quite often entrust that
| responsibility to people who are very, very far from performing
| at their best.
|
| So while a financial adviser AI is useless unless it can
| perform at the level of a trained professional doing their job
| (or unless it can perform at maybe 95% of that level at much
| lower cost), a self-driving car is at least _potentially_
| useful if it's only somewhat better than people at or close to
| their worst. As a high proportion of road traffic collisions
| are caused by people who are drunk, tired, emotionally unstable
| or otherwise very very far from the peak performance of a human
| being operating a car.
|
| (We can argue that a system which routinely requires people to
| carry out life-or-death, mission-critical tasks while
| significantly impaired is dangerously flawed and needs a major
| overhaul, but that's a slightly different debate).
| IanCal wrote:
| I find this kind of argument comes up a lot and it seems
| fundamentally flawed to me.
|
| 1. You can set a bar wherever you want for a level of
| "seriousness" and huge swathes of real world work will fall below
| it, and are therefore attractive to tackle with these systems.
|
| 2. We build critical large scale systems out of humans, which are
| fallible and unverifiable. That's not to say current LLMs are
| human or equivalent, but "we can't verify X works all the time"
| doesn't stop us doing exactly that a _lot_. We deal with this by
| learning how humans make mistakes, why, and build systems of
| checks around that. There is nothing in my ind that stops us
| doing the same with other AI systems.
|
| 3. Software is written by, checked by and verified by humans at
| least at some critical point - so even verified software still
| has this same problem.
|
| We've also been doing this kind of thing with ML models for ages,
| and we use buggy systems for an enormous amount of work
| worldwide. You can argue we shouldn't and should have fully
| formally verified systems for everything, but you can't deny that
| right now we have large serious systems without that.
|
| And if your goal is "replace a human" then I just don't think you
| can reasonably say that it requires verifiable software.
|
| > Systems are not explainable, as they have no model of knowledge
| and no representation of any 'reasoning'.
|
| Neither of those statements are true are they? There are internal
| models, and recent models are designed around having a
| representation of reasoning before replying.
|
| > current generative AI systems represent a dead end, where
| exponential increases of training data and effort will give us
| modest increases in impressive plausibility but no foundational
| increase in reliability
|
| And yet reliability is something we see improve as LLMs get
| better and we get better at training them.
| nialse wrote:
| There are two epistemic poles: the atomistic and the
| probabilistic. The author subscribes to a rule-based atomistic
| worldview, asserting that any perspective misaligned with this
| framework is incorrect. Currently, academia is undergoing a
| paradigm shift in the field of artificial intelligence. Symbolic
| AI, which was the initial research focus, is rapidly being
| replaced by statistical AI methodologies. This transition
| diminishes the relevance of atomistic or symbolic scientists,
| making them worry they might become irrelevant.
| irunmyownemail wrote:
| Not sure I followed all of that lingo but it sounds like a
| fancy way of saying, if you're losing the game, try shifting
| the goal post.
| nialse wrote:
| Indeed and unfortunately. I've been reading up on "the
| binding problem" in AI lately and came across a paper that
| hinged on there being an "object representation" which would
| magically solve the apparent issues in symbolic AI. In the
| discussion some 20 pages later, the authors confessed that
| they, nor anybody else, could define what an object was in
| the first place. Sometimes the efforts seem focused on "not
| letting the other team win" rather than actually having
| something tangible to bring to the table.
| dbmikus wrote:
| I never want to claim certainties, but it seems pretty close to
| certain that symbolic AI loses to statistical AI.
|
| I think there is room for statistical AI to operate symbolic
| systems so we can better control outputs. Actually, that's kind
| of what is going on when we ask AI to write code.
| Shorel wrote:
| I think that transition already happened, and the next big jump
| in AI will be the combination of these two approaches in an
| unified package.
|
| Kind of the way our right brain hemisphere does probabilistic
| computation and the left brain hemisphere does atomistic
| computation. And we use both.
|
| So, whoever develops the digital equivalent of the corpus
| callosum wins.
| nialse wrote:
| An observation with scientific paradigm shifts is that they
| tend not to reverse. Also the lingo someone commented on is
| that the fundamental problem is the different philosophical
| views of what knowledge is and can be. Either knowledge is
| base on symbols and rules like in mathematics or they are
| probabilities like in anything we actually can measure. Both
| these views can coexist and maybe AI will find the missing
| link between them some day. Possibly no human will grasp the
| link.
| llm_trw wrote:
| No, they are very useful tools to build up inteligent systems out
| of.
|
| Everything from perplexity onward shows just how useful agents
| can be.
|
| You get another bump in utility when you allow for agents swarms.
|
| Then another one for dynamically generated agent swarms.
|
| The only reason why it's not coming for your job is that LLMs are
| currently too power hungry to run those jobs for anything but
| research - at a couple thousand to couple of million times the
| price of a human doing the work.
|
| Which works out to 10 to 20 epochs of whatever Moore's law looks
| like in graphics cards.
| throw83288 wrote:
| What is that bump in utility in practical terms? You can point
| to a benchmark improvement but that's no indication the agent
| swarm is not reducing to "giving an llm an arbitrary amount of
| random guesses".
| inciampati wrote:
| Want reliable AI? Stop approximating memory with attention and
| build reliable memory into the model directly.
| logicchains wrote:
| Standard LLM quadratic attention isn't an approximation, it's
| perfect recall. Approaches that compress that memory down into
| a fixed-size state are an approximation, and generally perform
| worse, that's why linear transformers aren't widely used.
| mrtksn wrote:
| What I find curious is that the people who sell the AI as the
| holy grail that will make any jobs obsolete in a few year at the
| same time claim that there's huge talent shortage and even engage
| in feud on immigration and spend capital to influence immigration
| policies.
|
| Apparently they don't believe that AI is about to revolutionize
| things that much. This makes me believe that significant part of
| the AI investment is just FOMO driven, so no real revolution is
| around the corner.
|
| Although we keep seeing claims that AI achieved PHD level this
| Olympics level that, people who actually own these keep demanding
| immigration policy changes to bring actual humans from overseas
| for year to come.
| fhd2 wrote:
| Is that so? I'm not in the US, so I don't have a good idea of
| what's going on there. But wasn't there relatively high
| unemployment among developers after all these Big Tech layoffs
| post pandemic? Shouldn't companies there have an easy time
| finding local talent?
|
| Sorry for the potentially silly question. I just spent some
| time trying to research it and came up with nothing concrete.
| mrtksn wrote:
| > But wasn't there relatively high unemployment among
| developers after all these Big Tech layoffs post pandemic?
|
| I'm speculating too but yes it appears that unemployment is
| pretty high among the CS majors: https://www.reddit.com/r/csM
| ajors/comments/1hhl060/how_is_it...
|
| But at the same time there's an ongoing infighting among
| Trump supporters because tech elites came up as pro - skilled
| immigration where the MAGA camp turned against them. The tech
| elites claim that there's a talent shortage. Here's a short
| rundown that Elon Musk agrees with:
| https://x.com/AutismCapital/status/1872408010653589799
| fhd2 wrote:
| Ah I see, thought I missed some major story, but apparently
| not.
|
| The unemployment data is from 2018 BTW. But from what I
| perceive, developer unemployment in the US seems higher
| than usual right now.
| mrtksn wrote:
| Good catch but yes, my personal observation is the same
| and not only in US.
| exe34 wrote:
| Have you maybe confused the time periods in the different
| discussions? I think the AI making jobs obsolete part is in the
| next few years, whereas the talent shortage issue is right now
| - although as usual, it's a wage issue, not a talent issue. Pay
| enough and the right people will turn up.
| mrtksn wrote:
| Who knows about the future, right? I'm just trying to read
| the expectations of the people who have control over both the
| AI, Capital and Politics and they don't strike me as
| optimistic about AI actually doing much in near future.
| exe34 wrote:
| they seem to be investing a lot into replacing workers with
| AI.
| mrtksn wrote:
| And that might be a FOMO or they can simply exit with
| profit as long as they can flame up the hype. An of
| course, they may be hoping to have it in long term.
|
| They are not replacing their workers despite claiming
| that AI is currently as good as a PHD and they certainly
| don't go to AI medical doctors despite claiming that
| their tool is better than most doctors.
| hackable_sand wrote:
| It's not a wage issue.
| exe34 wrote:
| are you saying the free market doesn't work?
| nyarlathotep_ wrote:
| Schrodinger's Job Market, yeah.
|
| The whole conversation is so dishonest.
|
| Every software firm, notable and small, has had layoffs over
| the past two years, but somehow there's still a "STEM shortage"
| and companies are "starving for talent" or some such nonsense?
|
| Fake discussion.
| JimmyWilliams1 wrote:
| The reliance on large datasets for training AI models introduces
| biases present in the data, which can perpetuate or even
| exacerbate societal inequalities. It's essential to approach AI
| development with caution, ensuring robust ethical guidelines and
| comprehensive testing are in place before integrating AI into
| sensitive areas.
|
| As we continue to innovate, a focus on explainability, fairness,
| and accountability in AI systems will be paramount to harnessing
| their potential without compromising societal values.
| owenpalmer wrote:
| > exacerbate societal inequalities
|
| Do you have an example of this?
| tbenst wrote:
| As a neuroscientist, my biggest disagreement with the piece is
| the author's argument for compositionality over emergence. The
| former makes me think of Prolog and lisp, while the later is a
| much better description for a brain. I think ermergence is a much
| more promising direction for AGI than compositionality.
| dbmikus wrote:
| 100% agree. When we explicitly segment and compose AI
| components, we are removing the ability for them to learn their
| own pathways between the components. We've been proven time and
| time again the bitter lesson[1]: that throwing a ton of data
| and compute at a model yields better results than what we could
| come up with.
|
| That said, we can still isolate and modify parts of a network,
| and combine models trained for different tasks. But you need to
| break things down into components after the fact, instead of
| beforehand, in order to get the benefits of learning via scale
| of data + compute.
|
| [1]: http://www.incompleteideas.net/IncIdeas/BitterLesson.html
| cs702 wrote:
| As of right now, we have no way of knowing in advance what the
| capabilities of current AI systems will be if we are able to
| scale them by 10x, 100x, 1000x, and more.
|
| The number of neuron-neuron connections in current AI systems is
| still _tiny_ compared to the human brain.
|
| The largest AI systems in use today have _hundreds of billions_
| of parameters. Nearly all parameters are part of a weight matrix,
| each parameter quantifying the strength of the connection from an
| artificial input neuron to an artificial output neuron. The human
| brain has more than a _hundred trillion_ synapses, each
| connecting an organic input neuron to an organic output neuron,
| but the comparison is not apples-to-apples, because each synapse
| is much more complex than a single parameter in a weight
| matrix.[a]
|
| Today's largest AI systems have about the same number of neuron-
| neuron connections as the brain of a _brown rat_.[a] Judging
| these AI systems based on their current capabilities is like
| judging organic brains based on the capabilities of brown rat
| brains.
|
| What we can say with certainty is that _today 's_ AI systems
| cannot be trusted to be reliable. That's true for highly trained
| brown rats too.
|
| ---
|
| [a]
| https://en.wikipedia.org/wiki/List_of_animals_by_number_of_n...
| -- sort in descending order by number of synapses.
| JumpCrisscross wrote:
| > _we have no way of knowing in advance what the capabilities
| of current AI systems will be if we are able to scale them by
| 10x, 100x, 1000x, and more_
|
| This doesn't solve the unpredictability problem.
| leobg wrote:
| We don't know. We didn't predict that the rat brain would get
| us here. So we also can't be confident in our prediction that
| scaling it won't solve hallucination problems.
| cs702 wrote:
| No, it doesn't "solve" the unpredictability problem.
|
| But we haven't solved it for human beings either.
|
| Human brains are unpredictable. Look around you.
| timeon wrote:
| How are humans relevant here? As example, we operate at
| different speed.
| cs702 wrote:
| Humankind has developed all sorts of systems and
| processes to cope with the unpredictability of human
| beings: legal systems, organizational structures,
| separate branches of government, courts of law, police
| and military forces, organized markets, double-entry
| bookkeeping, auditing, security systems, anti-malware
| software, etc.
|
| While individual human beings do trust _some_ of the
| other human beings they know, in the aggregate society
| doesn 't seem to trust human beings to behave reliably.
|
| It's possible, though I don't know for sure, that we're
| going to need systems and processes to cope with the
| unpredictability of AI systems.
| mrweasel wrote:
| Are you expecting AIs to be more reliable, because
| they're slower?
| uoaei wrote:
| Human performance, broadly speaking, is _the_ benchmark
| being targeted by those training AI models. Humans are
| part of the conversation since that 's the only kind of
| intelligence these folks can conceive of.
| sdesol wrote:
| > Human brains are unpredictable. Look around you.
|
| As it was mentioned by others, we've had thousands of years
| to better understand how humans can fail. LLMs are black
| boxes and it never ceases to amaze me how they can fail in
| such unpredictable ways. Take the following for examples.
|
| Here GPT-4o mini is asked to calculate 2+3+5
|
| https://beta.gitsense.com/?chat=8707acda-e6d4-4f69-9c09-2cf
| f...
|
| It gets the answer correct, but if you ask it to verify its
| own answer
|
| https://beta.gitsense.com/?chat=6d8af370-1ae6-4a36-961d-290
| 2...
|
| it says the response was wrong, and contradicts itself. Now
| if you ask it to compare all the responses
|
| https://beta.gitsense.com/?chat=1c162c40-47ea-419d-af7a-a30
| a...
|
| it correctly identifies that GPT-4o mini was incorrect.
|
| It is this unpredictable nature that makes LLM insanely
| powerful and scary.
|
| Note: The chat on the beta site doesn't work.
| clint wrote:
| You seem to believe that humans, on their own, are not
| stochastic and unpredictable. I contend that if this is your
| belief then you couldn't be more wrong.
|
| Humans are EXTREMELY unpredictable. Humans only become
| slightly more predictable and producers of slightly more
| quality outputs with insane levels of bureaucracy and layers
| upon layers upon layers of humans to smooth it out.
|
| To boot, the production of this mediocre code is very very
| very slow compared to LLMs. LLMs also have no feelings, egos,
| and are literally tunable and directible to produce better
| outcomes without hurting people in the process (again,
| something that is very difficult to avoid without the
| inclusion of, yep, more humans more layers, more protocol
| etc.)
|
| Even with all of this mass of human grist, in my opinion, the
| output of purely human intellects is, on average, very bad.
| Very bad in terms of quality of output and very bad in terms
| of outcomes for the humans involved in this machine.
| FredPret wrote:
| If brown-rats-as-a-service is as useful as it is already, then
| I'm excited by what the future holds.
|
| I think to make it to the next step, AI will have to have some
| way of performing rigorous logic integrated on a low level.
|
| Maybe scaling that brown-rat brain will let it emulate an
| internal logical black box - much like the old adage about a
| sufficiently large C codebase containing an imperfect Lisp
| implementation - but I think things will get really cool we
| figure out how to wire together something like Wolfram Alpha, a
| programming language, some databases with lots of actual facts
| (as opposed to encoded/learned ones), and ChatGPT.
| cs702 wrote:
| It's already better than _real_ rats-as-a-service, certainly:
|
| https://news.ycombinator.com/item?id=42449424
| ndesaulniers wrote:
| Does it matter what color the rat is?
| notpushkin wrote:
| I suppose it refers to the particular species, _Rattus
| norvegicus_ (although I 'd call it common rat personally).
| petesergeant wrote:
| ChatGPT can already run code, which allows it to overcome
| some limitations of tokenization (eg counting the letters in
| strawberry, sorting words by their second letter). Doesn't
| seem like adding a Prolog interpreter would be all that hard.
| Kim_Bruning wrote:
| ChatGPT does already have access to Bing (would that count as
| your facts database?) and Jupyter (which is sort of a
| Wolphram clone except with Python?).
|
| It still won't magically use them 100% correctly, but with a
| bit of smarts you can go a long way!
| zozbot234 wrote:
| A brown rat's brain is also a lot more energy efficient than
| your average LLM. Especially in the learning phase, but not
| only.
| cs702 wrote:
| Yes, I agree, but energy efficiency is orthogonal to
| capabilities.
| sanderjd wrote:
| No it isn't, because it is relevant to the question of
| whether the current approaches _can_ be scaled 100x or
| 1000x.
| cs702 wrote:
| That's a hardware question, not a software question, but
| it is a fair question.
|
| I don't know if the hardware can be scaled up. That's why
| I wrote " _if_ we 're able to scale them" at the root of
| this thread.
| bee_rider wrote:
| It is probably a both question. If 100x is the goal,
| they'll have to double up the efficiency 7 times, which
| seems basically plausible given how early-days it still
| is (I mean they have been training on GPUs this whole
| time, not ASICs... bitcoins are more developed and they
| are a dumb scam machine). Probably some of the doubling
| will be software, some will be hardware.
| sanderjd wrote:
| Yep, agreed.
|
| I'm pretty skeptical of the scaling hypothesis, but I
| also think there is a huge amount of efficiency
| improvement runway left to go.
|
| I think it's more likely that the return to further
| scaling will become net negative at some point, and then
| the efficiency gains will no longer be focused on doing
| more with more but rather doing the same amount with
| less.
|
| But it's definitely an unknown at this point, from my
| perspective. I may be very wrong about that.
| sanderjd wrote:
| The question is essentially: Can the current approaches
| we've developed get to or beyond human level
| intelligence?
|
| Whether those approaches can scale enough to achieve that
| is relevant to the question, whether the bottleneck is in
| hardware or software.
| s1artibartfast wrote:
| That depends on if efficiency is part of the scaling
| process
| cruffle_duffle wrote:
| Honestly I think the opposite. All these giant tech
| companies can afford to burn money with ever bigger models
| and ever more compute and I think that is actually getting
| in their way.
|
| I wager that some scrappy resource constrained startup or
| research institute will find a way to produce results that
| are similar to those generated by these ever massive LLM
| projects only at a fraction of the cost. And I think
| they'll do that by pruning the shit out of the model. You
| don't need to waste model space on ancient Roman history or
| the entire canon for the marvel cinematic universe on a
| model designed to refactor code. You need a model that is
| fluent in English and "code".
|
| I think the future will be tightly focused models that can
| run on inexpensive hardware. And unlike today where only
| the richest companies on the planet can afford training,
| anybody with enough inclination will be able to train them.
| (And you can go on a huge tangent why such a thing is
| absolutely crucial to a free society)
|
| I dunno. My point is, there is little incentive for these
| huge companies to "think small". They have virtually
| unlimited budgets and so all operate under the idea that
| more is better. That isn't gonna be "the answer"... they
| are all gonna get instantly blindsided by some group who
| does more with significantly less. These small scrappy
| models and the institutes and companies behind them will
| eventually replace the old guard. It's a tale as old as
| time.
| clayhacks wrote:
| Deepseek just released their frontier model that they
| trained on 2k GPUs for <$6M. Way cheaper than a lot of
| the big labs. If the big labs can replicate some of their
| optimisations we might see some big gains. And I would
| hope more small labs could then even further shrink the
| footprint and costs
| cruffle_duffle wrote:
| I don't think this stuff will be truly revolutionary
| until I can train it at home or perhaps as a group (SETI
| at home anybody?)
|
| Six million is a start but this tech won't truly be
| democratized until it costs $1000.
|
| Obviously I'm being a little cheeky but my real point
| is... the idea that this technology is in the control of
| massive technology companies is dystopian as fuck. Where
| is the RMS of the LLM space? Who is shouting from every
| rooftop how dangerous it is to grant so much power and
| control over information to a handful of massive tech
| companies, all whom have long histories of caving into
| various government demands. It's scary as fuck.
| lodovic wrote:
| This is just a tech race. we'll get affordable 64 gb gpus
| in a few years, businesses want to run their own models.
| DrBenCarson wrote:
| It's not at all, energy is a hard constraint to capability.
|
| Human intelligence improved dramatically after we improved
| our ability to extract nutrients from food via cooking
|
| https://www.scientificamerican.com/article/food-for-
| thought-...
| ben_w wrote:
| > It's not at all, energy is a hard constraint to
| capability.
|
| We can put a lot more power flux through an AI than a
| human body can live through; both because computers can
| run hot enough to cook us, and because they can be
| physically distributed in ways that we can't survive.
|
| That doesn't mean there's no constraint, it's just that
| the extent to which there is a constraint, the constraint
| is way, _way_ above what humans can consume directly.
|
| Also, electricity is much cheaper than humans. To give a
| worked example, consider that the UN poverty threshold*
| is about US$2.15/day in 2022 money, or just under
| 9C//hour. My first Google search result for "average cost
| of electricity in the usa" says "16.54 cents per kWh",
| which means the UN poverty threshold human lives on a
| price equivalent ~= just under 542 watts of average
| American electricity.
|
| The actual power consumption of a human is 2000-2500
| kcal/day ~= 96.85-121.1 watts ~= about a fifth of that.
| In certain narrow domains, AI already makes human labour
| uneconomic... though fortunately for the ongoing payment
| of bills, it's currently only that combination of good-
| and-cheap in narrow domains, not generally.
|
| * I use this standard so nobody suggests outsourcing
| somewhere cheaper.
| ben_w wrote:
| Are you sure?
|
| The average brown rat may use only 60 kcal per day, but the
| maximum firing rate of biological neurons is about 100-1000
| Hz rather than the A100 clock speed of about 1.5 GHz*, so the
| silicon gets through the same data set something like
| 1.5e6-1.5e7 times faster than a rat could.
|
| Scaling up to account for the speed difference, the rat
| starts looking comparable to a 9e7 - 9e8 kcal/day, or 4.4 to
| 44 megawatts, computer.
|
| * and the transistors within the A100 are themselves much
| faster, because clock speed is ~ how long it takes for all
| chained transistors to flip in the most complex single-clock-
| cycle operation
|
| Also I'm not totally confident about my comparison because I
| don't know how wide the data path is, how many different
| simultaneous inputs a rat or a transformer learns from
| legacynl wrote:
| That's a stupid analogy because you're comparing a
| brainprocess to a full animal.
|
| Only a small part of that 60kcal is used for learning, and
| for that same 60 kcal you get an actual physical being that
| is able to procreate, eat, do things and fend for and
| maintain itself.
|
| Also you cannot compare neuron firing rates with
| clockspeed. Afaik each neuron in a ml-model can have code
| that takes several clock cycles to complete.
|
| Also an neuron in ml is just a weighted value, a biological
| neuron does much more than that. For example neurons
| communicate using neuro transmitters as well as using
| voltage potentials. The actual date rate of biological
| neurons is therfore much higher and complex.
|
| Basically your analogy is false because your napkin-math
| basically forgets that the rat is an actual biological rat
| and not something as neatly defined as a computer chip
| ben_w wrote:
| > Also an neuron in ml is just a weighted value, a
| biological neuron does much more than that. For example
| neurons communicate using neuro transmitters as well as
| using voltage potentials. The actual date rate of
| biological neurons is therfore much higher and complex.
|
| The conclusion does not follow from the premise. The
| observed maximum rate of the inter-neuron communication
| is important, the mechanism is not.
|
| > Also you cannot compare neuron firing rates with
| clockspeed. Afaik each neuron in a ml-model can have code
| that takes several clock cycles to complete.
|
| Depends how you're doing it.
|
| Jupyter notebook? Python in general? Sure.
|
| A100s etc., not so much -- those are specialist systems
| designed for this task:
|
| """1024 dense FP16/FP32 FMA operations per clock""" -
| https://images.nvidia.com/aem-dam/en-zz/Solutions/data-
| cente...
|
| "FMA" meaning "fused multiply-add". It's the unit that
| matters for synapse-equivalents.
|
| (Even that doesn't mean they're perfect fits: IMO a
| "perfect fit" would likely be using transistors as analog
| rather than digital elements, and then you get to run
| them at the native transistor speed of ~100 GHz or so and
| don't worry too much about how many bits you need to
| represent the now-analog weights and biases, but that's
| one of those things which is easy to say from a
| comfortable armchair and very hard to turn into silicon).
|
| > Basically your analogy is false because your napkin-
| math basically forgets that the rat is an actual
| biological rat and not something as neatly defined as a
| computer chip
|
| Any of those biological functions that don't correspond
| to intelligence, make the comparison more extreme in
| favour of the computer.
|
| This is, after all, a question of their mere
| intelligence, not how well LLMs (or indeed any AI) do or
| don't function as _von Neumann replicators_ , which is
| where things like "procreate, eat, do things and fend for
| and maintain itself" would actually matter.
| haolez wrote:
| And it learns online.
| bee_rider wrote:
| Rats are pretty clever, and they (presumably, at least) have a
| lot of neurons spending their time computing things like...
| where to find food, how frightened of this giant reality
| warping creature in a lab coat should I be, that sort of thing.
| I don't think it is obvious that one brown-rat-power isn't
| useful.
|
| I mean we have dogs. We really like them. For ages, they did
| lots of useful work for us. They aren't that much smarter than
| rats, right? They are better aligned and have a more useful
| shape. But it isn't obvious (to me at least) that the rats'
| problem is insufficient brainpower.
| bloopernova wrote:
| Dogs, if I recall correctly, have evolved alongside us and
| have specific adaptations to better bond with us. They have
| eyebrow muscles that wolves don't, and I think dogs have
| brain adaptations too.
| runarberg wrote:
| We have been with dogs for such a long time, I wouldn't be
| surprised if we also have adaptations to bond with dogs.
|
| I mean dogs came with us to the Americas, and even to
| Australia. Both the Norse and the Inuit took dogs with them
| to Greenland.
| mulmen wrote:
| Depends on how you define smart. Dogs definitely have larger
| brains. But then humans have even larger brains. If dogs
| aren't smarter than rats then the size of brain isn't
| proportional to intelligence.
| hn_throwaway_99 wrote:
| I think the comparison to brown rat brains is a huge mistake.
| It seems pretty apparent (at least from my personal usage of
| LLMs in different contexts) that modern AI is _much_ smarter
| than a brown rat at some things (I don 't think brown rats can
| pass the bar exam), but in other cases it becomes apparent that
| it isn't "intelligent" at all in the sense that it becomes
| clear that it's just regurgitating training data, albeit in a
| highly variable manner.
|
| I think LLMs and modern AI are incredibly amazing and useful
| tools, but even with the top SOA models today it becomes
| clearer to me the more I use them that they are fundamentally
| lacking crucial components of what average people consider
| "intelligence". I'm using quotes deliberately because the
| debate about "what is intelligence" feels like it can go in
| circles endlessly - I'd just say that the core concept of what
| we consider understanding, especially as it applies to creating
| and exploring novel concepts that aren't just a mashup of
| previous training examples, appears to be sorely missing from
| LLMs.
| cs702 wrote:
| Imagine it were possible to take a rat brain, keep it alive
| with a permanent source of energy, wire its input and output
| connections to a computer, and then train the rat brain's
| output signals to predict the next token, given previous
| tokens fed as inputs, using graduated pain or pleasure
| signals as the objective loss function. All the neuron-neuron
| connections in that rain brain would eventually serve one,
| and only one, goal: predicting an accurate probability
| distribution over the next possible token, given previous
| tokens. The number of neuron-neuron connections in this "rat-
| brain-powered LLM" would be comparable to that of today's
| state-of-the-art LLMs.
|
| This is less far-fetched than it sounds. Search for "organic
| deep neural networks" online.
|
| Networks of rat neurons have in fact been trained to fly
| planes, in simulators, among other things.
| ImHereToVote wrote:
| Human brain organelles are in use right now by a Swiss
| company.
| cs702 wrote:
| Thanks. Yeah, I've heard there are a bunch of efforts
| like that, but as far as I know, all are very early
| stage.
|
| I do wonder if the most energy-efficient way to scale up
| AI models is by implementing them in organic substrates.
| cynicalpeace wrote:
| > modern AI is much smarter than a brown rat at some things
| (I don't think brown rats can pass the bar exam), but in
| other cases it becomes apparent that it isn't "intelligent"
| at all
|
| There is no modern AI system that can go into your house and
| find a piece of cheese.
|
| The whole notion that modern AI is somehow "intelligent", yet
| can't tell me where the dishwasher is in my house is
| hilarious. My 3 year old son can tell me where the dishwasher
| is. A well trained dog could do so.
|
| It's the result of a nerdy definition of "intelligence" which
| excludes anything to do with common sense, street smarts,
| emotional intelligence, or creativity (last one might be
| debatable but I've found it extremely difficult to prompt AI
| to write amazingly unique and creative stories reliably)
|
| The AI systems need bodies to actually learn these things.
| CooCooCaCha wrote:
| Where do you think common sense, emotional intelligence,
| creativity, etc. come from? The spirit? Some magic brain
| juice? No, it comes from neurons, synapses, signals,
| chemicals, etc.
| cynicalpeace wrote:
| It comes from billions of years of evolution, the
| struggle to survive and maintain your body long enough to
| reproduce.
|
| "Neurons, synapses, signals, chemicals" are downstream of
| that.
| mensetmanusman wrote:
| Why would dust care about survival?
| cynicalpeace wrote:
| -\\_(tsu)_/- Consult a bible
| FrustratedMonky wrote:
| a 'dust to dust' joke?
|
| Or just saying, when facing the apocalypse, read a bible?
| bee_rider wrote:
| It doesn't. Actually, quite a few of the early stages of
| evolution wouldn't have any analogue to "care," right? It
| just happened in this one environment, the most
| successful self-reproducing processes happened to be get
| more complex over time and eventually hit the point where
| they could do, and then even later define, things like
| "care."
| mulmen wrote:
| Without biological reproduction wouldn't the evolutionary
| outcomes be different? Cyborgs are built in factories,
| not wombs.
| mensetmanusman wrote:
| There are robots that can do this now, they just cost
| $100k.
| uoaei wrote:
| That's just the hardware, but AI as currently practiced
| is purely a software endeavor.
| cynicalpeace wrote:
| Correct, and the next frontier is combining the software
| with the hardware.
| cynicalpeace wrote:
| Find a piece of cheese pretty much anywhere in my home?
|
| Or if we're comparing to a three year old, also find the
| dishwasher?
|
| Closest I'm aware of is something by Boston Dynamics or
| Tesla, but neither would be as simple as asking it-
| wheres the dishwasher in my home?
|
| And then if we compare it to a ten year old, find the
| woodstove in my home, tell me the temperature, and adjust
| the air intake appropriately.
|
| And so on.
|
| I'm not saying it's impossible. I'm saying there's no AI
| system that has this physical intelligence yet, because
| the robot technology isn't well developed/integrated yet.
|
| For AI to be something more than a nerd it needs a body
| and I'm aware there are people working on it. Ironically,
| not the people claiming to be in search of AGI.
| HDThoreaun wrote:
| If you upload pictures of every room in your house to an
| LLM it can definitely tell you where the dishwasher is. If
| your argument is just that they cant walk around your house
| so they cant be intelligent I think thats pretty clearly
| wrong.
| kimixa wrote:
| A trained image recognition model could probably
| recognize a dishwasher from an image.
|
| But that won't be the same model that writes bad poetry
| or tries to autocomplete your next line of code. Or
| control the legs of a robot to move towards the
| dishwasher while holding a dirty plate. And each has a
| fair bit of manual tuning and preprocessing based on its
| function which may simply not be applicable to other
| areas even with scale. The best performing models aren't
| just taking in unstructured untyped data.
|
| Even the most flexible models are only tackling a small
| slice of what "intelligence" is.
| jdietrich wrote:
| ChatGPT, Gemini and Claude are all natively multimodal.
| They can recognise a dishwasher from an image, among many
| other things.
|
| https://www.youtube.com/watch?v=KwNUJ69RbwY
| cynicalpeace wrote:
| Can they take the pictures?
| ta988 wrote:
| Technically yes they can run functions. There were
| experiments of Claude used to run a robot around a house.
| So technically, we are not far at all and current models
| may even be able to do it.
| cynicalpeace wrote:
| Please re-read my original comment.
|
| "The AI systems need bodies to actually learn these
| things."
|
| I never said this was impossible to achieve.
| sippeangelo wrote:
| Can your brain see the dishwasher without your eyes?
| sdenton4 wrote:
| But do they have strong beaks?
|
| https://sktchd.com/column/comics-disassembled-ten-things-
| of-...
| cynicalpeace wrote:
| Do they know what a hot shower feels like?
|
| They can describe it. But do they actually know? Have
| they experienced it?
|
| This is my point. Nerds keep dismissing physicality and
| experience.
|
| If your argument is a brain in a jar will be generally
| intelligent, I think that's pretty clearly wrong.
| HDThoreaun wrote:
| See the responses section
| https://en.wikipedia.org/wiki/Knowledge_argument This
| idea certainly has been long considered but I personally
| reject it.
| cynicalpeace wrote:
| While interesting, this is a separate thought experiment
| with its own quirks. Sort of a strawman, since my
| argument is formulated differently and simply argues that
| AIs need to be more than brains in jars for them to be
| considered generally intelligent.
|
| And that the only reason we think AIs can just be brains
| in jars is because many of the people developing them
| consider themselves as simply brains in jars.
| HDThoreaun wrote:
| Not really. The point of it is considering whether
| physical experience creates knowledge that is impossible
| to get otherwise. Thats the argument you are making no?
| If Mary learns nothing new when seeing red for the first
| time an AI would also learn nothing new when seeing red
| for the first time.
|
| > Do they know what a hot shower feels like? They can
| describe it. But do they actually know? Have they
| experienced it
|
| Is directly a knowledge argument
| cynicalpeace wrote:
| Mary in that thought experiment is not an LLM that has
| learned via text. She's acquired "all the physical
| information there is to obtain about what goes on when we
| see ripe tomatoes". This does not actually describe
| modern LLMs. It actually better describes a robot that
| has transcribed the location, temperature, and velocity
| of water drops from a hot shower to its memory. Again,
| this thought experiment has its own quirks.
|
| Also, it is an argument against physicalism, which I have
| no interest in debating. While it's tangentially related,
| my point is not for/against physicalism.
|
| My argument is about modern AI and it's ability to learn.
| If we put touch sensors, eyes, nose, a mechanism to
| collect physical data (legs) and even sex organs on an AI
| system, then it is more generally intelligent than
| before. It will have learned in a better fashion what a
| hot shower feels like and will be smarter for it.
| HDThoreaun wrote:
| > While it's tangentially related, my point is not
| for/against physicalism.
|
| I really disagree. Your entire point is about
| physicalism. If physicalism is true than an AI does not
| necessarily learn in a better fashion what a hot shower
| feels like by being embodied. In a physicalist world it
| is conceivable to experience that synthetically.
| Dilettante_ wrote:
| So are you saying people who have CIPA are less
| intelligent for never having experienced a hot shower? By
| that same logic, does its ability to experience more
| colors increase the intelligence of a mantis shrimp?
|
| Perhaps your own internal definition of intelligence
| simply deviates significantly from the common, "median"
| definition.
| cynicalpeace wrote:
| It's the totality of experiences that make an individual.
| Most humans that I'm aware of have a greater totality of
| experiences that make them far smarter than any modern AI
| system.
| skinner_ wrote:
| Greater totality of experiences than having read the
| whole internet? Obviously they are very different kind of
| experiences, but a greater totality? I'm not so sure.
|
| Here is what we know: The Pile web scrape is 800GB. 20
| years of human experience at 1kB/sec is 600GB. Maybe
| 1kB/sec is bad estimate. Maybe sensory input is more
| valuable than written text. You can convince me. But next
| challenge, some 10^15 seconds of currently existing
| youtube video, that's 2 million years of audiovisual
| experience, or 10^9GB at the same 1kB/sec.
| tomrod wrote:
| The proof that 1+1=2 is nontrivial despite it being clear
| and obvious. It does not rely on physicality nor
| experience to prove.
|
| There are areas of utility here. Things need not be able
| to do all actions to be useful.
| momentoftop wrote:
| There isn't a serious proof that 1+1=2, because it's near
| enough axiomatic. In the last 150 years or so, we've been
| trying to find very general logical systems in which we
| can encode "1", "2" and "+" and for which 1+1=2 is a
| theorem, and the derivations are sometimes non-trivial,
| but they are ultimately mere sanity checks that the
| logical system can capture basic arithmetic.
| magpi3 wrote:
| Could it tell the difference between a dishwasher and a
| picture of a dishwasher on a wall? Or one painted onto a
| wall? Or a toy dishwasher?
|
| There is an essential idea of what makes something a
| dishwasher that LLM's will never be able to grasp no
| matter how many models you throw at them. They would have
| to fundamentally understand that what they are "seeing"
| is an electronic appliance connected to the plumbing that
| washes dishes. The sound of a running dishwasher, the
| heat you feel when you open one, and the wet, clean
| dishes is also part of that understanding.
| viraptor wrote:
| Yes, it can tell a difference, up to the point where the
| boundaries are getting fuzzy. But the same thing applies
| to us all.
|
| Can you tell this is a dishwasher?
| https://www.amazon.com.au/Countertop-Dishwasher-
| Automatic-Ve...
|
| Can you tell this is a drawing of a glass?
| https://www.deviantart.com/januarysnow13/art/Wine-Glass-
| Hype...
|
| Can you tell this is a toy?
| https://www.amazon.com.au/Theo-Klein-Miele-Washing-
| Machine/d...
| theamk wrote:
| That really makes no sense.. would you say someone who is
| disabled bellow the neck is not intellegent / has no common
| sense, street smaets, creativity, etc...?
|
| Or would you say that you cannot judge the intellegence of
| someone by reading their books / exchanging emails with
| them?
| megamix wrote:
| What do you think of copyright violations?
| bee_rider wrote:
| IMO it is sad that the sort of... anti-establishment side of
| tech has suddenly become very worried about copyright. Bits
| inherently can be copied for free (or at least very cheap),
| copyright is a way to induce scarcity for the market to
| exploit where there isn't any on a technical level.
|
| Currently the AI stuff kind of sucks because you have to be a
| giant corp to train a model. But maybe in a decade, users
| will be able to train their own models or at least fine-tune
| on basic cellphone and laptop (not dgpu) chips.
| uoaei wrote:
| The copyright question is inherently tied to the
| requirement to earn money from your labor in this economy.
| I think the anti-establishment folks are not so rabid that
| they can't recognize real material conditions.
| LordDragonfang wrote:
| I think that would be a more valid argument if they ever
| cared about automating away jobs before. As it stands,
| anyone who was standing in the way of the glorious march
| of automation towards a post-scarcity future was called a
| luddite - right up until that automation started
| threatening their (material) class.
|
| I mean, you don't have to look any further than the
| (justified) lack of sympathy to dockworkers just a few
| months ago: https://news.ycombinator.com/item?id=41704618
|
| The solution is not, and never has been, to shack up with
| the capital-c Capitalists in defense of copyright. It's
| to push for a system where having your "work" automated
| away is a relief, not a death sentence.
| uoaei wrote:
| There's both "is" and "ought" components to this
| conversation and we would do well to disambiguate them.
|
| I would engage with those people you're stereotyping
| rather than gossiping in a comments section, I suspect
| you will find their ideologies quite consistent once you
| tease out the details.
| deergomoo wrote:
| > IMO it is sad that the sort of... anti-establishment side
| of tech has suddenly become very worried about copyright
|
| It shouldn't be too surprising that anti-establishment
| folks are more concerned with trillion-dollar companies
| subsuming and profiting from the work of independent
| artists, writers, developers, etc., than with individual
| people taking IP owned by multimillion/billion-dollar
| companies. Especially when many of the companies in the
| latter group are infamous for passing only a tiny portion
| of the money charged onto the people doing the actual
| creative work.
| Earw0rm wrote:
| This.
|
| Tech still acts like it's the scrappy underdog, the
| computer in the broom cupboard where "the net" is a third
| space separate from reality, nerds and punks writing
| 16-bit games.
|
| That ceased to be materially true around twenty years ago
| now. Once Facebook and smart phones arrived, computing
| touched every aspect of peoples' lives. When tech is all-
| pervasive, the internal logic and culture of tech isn't
| sufficient to describe or understand what matters.
| bee_rider wrote:
| IMO this is looking at it through a lens which considers
| "tech" a single group. Which is a way of looking at is,
| maybe even the best way. But an alternative could be: in
| the battle between scrappy underdog and centralized
| sellout tech, the sellouts are winning.
| mulmen wrote:
| > in the battle between scrappy underdog and centralized
| sellout tech, the sellouts are
|
| Winning by what metric?
| TheOtherHobbes wrote:
| Copyright is the right to get a return from creative work.
| The physical ease - or otherwise - of copying is absolutely
| irrelevant to this. So is scarcity.
|
| It's also orthogonal to the current corporate dystopia
| which is using monopoly power to enclose the value of
| individual work from the other end - _precisely_ by
| inserting itself into the process of physical distribution.
|
| None of this matters if you have a true abundance economy,
| _but we don 't._ Pretending we do for purely selfish
| reasons - "I want this, and I don't see why I should pay
| the creator for it" - is no different to all the other ways
| that employers stiff their employees.
|
| I don't mean it's analogous, I mean it's exactly the same
| entitled mindset which is having such a catastrophic effect
| on everything at the moment.
| cruffle_duffle wrote:
| > IMO it is sad that the sort of... anti-establishment side
| of tech has suddenly become very worried about copyright.
|
| Remember Napster? Like how rebellious was that shit? Those
| times are what a true social upsetting tech looks like.
|
| You cannot even import a video into OpenAI's Sora without
| agreeing to a four (five?) checkbox terms & conditions
| screen. These LLM's come out of the box neutered by
| corporate lawyers and various other safety weenies.
|
| This shit isn't real until there are mainsteam media
| articles expressing outrage because some "dangerous group
| of dark web hackers finished training a model at home that
| very high school student on the planet can use to cheat on
| their homework" or something like that. Basically it ain't
| real until it actually challenges The Man. That isn't
| happening until this tech is able to be trained and
| inferenced from home computers.
| bee_rider wrote:
| Yeah, or if it becomes possible to train on a peer-to-
| peer network somehow. (I'm sure there's researching going
| on in that direction). Hopefully that sort of thing comes
| out of the mix.
| s1artibartfast wrote:
| I think that AI output is transformative.
|
| I think the training process constitutes commercial use.
| jokethrowaway wrote:
| copyrighting a sequence of numbers should have never existed
| in the first place
|
| great if AI accelerates its destruction (even if it's through
| lobbying to our mafia-style protect-the-richest-company
| governments)
| pockmarked19 wrote:
| Calling it a neural network was clearly a mistake on the
| magnitude of calling a wheel a leg.
| cgearhart wrote:
| This is an excellent analogy. Aside from "they're both
| networks" (which is almost a truism), there's really nothing
| in common between an artificial neural network and a brain.
| runarberg wrote:
| Neurons also adjust the signal strength based on previous
| stimuli, which in effect makes the future response
| weighted. So it is not far off--albeit a gross
| simplification--to call the brain a weight matrix.
|
| As I learned it, artificial neural networks were modeled
| after a simple model for the brain. The early (successful)
| models were almost all reinforcement models, which is also
| one of the most successful model for animal (including
| human) learning.
| legacynl wrote:
| I don't really get where you're coming from..
|
| Is your point that the capabilities of these models have
| grown such that 'merely' calling it a neural network doesn't
| fit the capabilities?
|
| Or is your point that these models are called neural networks
| even though biological neural networks are much more complex
| and so we should use a different term to differentiate the
| simulated from the biological ?
| juped wrote:
| It was clearly a mistake because people start attempting to
| make totally incoherent comparisons to rat brains.
| joe_the_user wrote:
| The OP is comparing the "neuron count" of an LLM to the
| neuron count of animals and humans. This comparison is
| clearly flawed. Even you step back and say "well, the units
| might not be the same but LLMs are getting more complex so
| pretty soon they'll be like animals". Yes, LLMs are complex
| and have gained more behaviors through size and increased
| training regimes but if you realize these structure aren't
| like brains, there's no argument here that they will soon
| reach to qualities of brains.
| cs702 wrote:
| Actually, I'm comparing the "neuron-neuron connection
| count," while admitting that the comparison is not
| apples-to-apples.
|
| This kind of comparison isn't a new idea. I think Hans
| Moravec[a] was the first to start making these kinds of
| machine-to-organic-brain comparisons, back in the 1990's,
| using "millions of instructions per second" (MIPS) and
| "megabytes of storage" as his units.
|
| You can read Moravec's reasoning and predictions here:
|
| https://www.jetpress.org/volume1/moravec.pdf
|
| ---
|
| [a] https://en.wikipedia.org/wiki/Hans_Moravec
| legacynl wrote:
| I think he was approaching the concept from the direction
| of "how many mips and megabytes do we need to create
| human level intelligence".
|
| That's a different take than "human level is this many
| mips and megabytes", i.e. his claims are about artificial
| intelligence, not about biological intelligence.
|
| The machine learning seems to be modeled after the action
| potential part of neural communication. But biological
| neurons can communicate also in different ways, i.e.
| neuro transmitters. Afaik this isn't modeled in the
| current ml-models at all (neither do we have a good idea
| how/why that stuff works). So ultimately it's pretty
| likely that a ml with a billion parameters does not
| perform the same as an organic brain with a billion
| synapses
| cs702 wrote:
| I never claimed the machines would achieve "human level,"
| however you define it. What I actually wrote at the root
| of this thread is that we have no way of knowing in
| advance what the _future_ capabilities of these AI
| systems might be as we scale them up.
| tshaddox wrote:
| Most simple comparisons are flawed. Even just comparing
| the transistor counts of CPUs with vastly different
| architectures would be quite flawed.
| torginus wrote:
| Afaict OP's not comparing neuron count, but neuron-to-
| neuron connections, aka synapses. And considering each
| synapse (weighted input) to a neuron performs
| computation, I'd say it's possible it captures a
| meaningful property of a neural network.
| bgnn wrote:
| excellent analogy. piggybacking on this: a lot of believers
| (as they are like religious fanatics) claim that more data
| and hardware will eventually make LLMs intelligent, as if
| it's even the neuron count matters. There is no other animal
| close to humans in intelligence, and we don't know why.
| Somehow though a random hallucinating LLMs + shit loads of
| electricity would figure it out. This is close to pure
| alchemy.
| runarberg wrote:
| I don't disagree with your main point but I want to push
| back on the notion that " _there is no other animal close
| to humans in intelligence_ ". This is only true in the
| sense that we humans define intelligence in human terms.
| Intelligence is a very fraught and problematic concept both
| in philosophy, but especially in the sciences (particularly
| psychology).
|
| If we were dogs surely we would say that humans were quite
| skillful, impressively so even, in pattern matching,
| abstract thought, language, etc. but are hopelessly dumb at
| predicting past presence via smell, a crow would similarly
| judge us on our inability to orient our selves, and
| probably wouldn't understand our language and thus
| completely miss our language abilities. We do the same when
| we judge the intelligence of non-human animals or systems.
|
| So the reason for why no other animal is close to us in
| intelligence is very simple actually, it is because of the
| way we define intelligence.
| andrepd wrote:
| NNs are in no way shape or form even remotely similar to human
| neural tissue, so your whole analogy falls there.
| legacynl wrote:
| A little nitpick; a biological neuron is much more complex than
| it's ml-model equivalent. a simple weighted function cannot
| fully replicate a neuron.
|
| That's why it's almost certain that a biological brain with a
| billion synapses outperforms a model with a billion parameters.
| mort96 wrote:
| Isn't that what they meant by this?
|
| > the comparison is not apples-to-apples, because each
| synapse is much more complex than a single parameter in a
| weight matrix.
| daveguy wrote:
| It isn't just "not apples to apples". It's apples to
| supercomputers.
| legacynl wrote:
| well yeah but it's un-obviously a very big difference that
| basically invalidates any conclusion that you can make with
| this comparison.
| mort96 wrote:
| I don't think so: it seems reasonable to assume that
| biological neurons are strictly more powerful than
| "neural network" weights, so the fact that a human brain
| has 3 orders of magnitude more biological neurons than
| language models have weights tells that we should expect,
| _as an extreme lower bound_ , 3 orders of magnitude
| difference.
| joe_the_user wrote:
| It's not a "nitpick", it's a complete refutation. LLM don't
| have a strong relationship to brains, they're just
| math/computer constructs.
| joe_the_user wrote:
| This tech has made a big impact, obviously is real and exactly
| what potentials can unlocked by scaling is worth considering...
|
| ... but calling vector-entries in a tensor flow process "
| _neurons_ " is at best a very loose analogy while comparing LLM
| "neuron numbers" to animals and humans is flat-out nonsense.
| fsndz wrote:
| yes indeed. But I see more and more people arguing against the
| very possibility of AGI. Some people say statistical models
| will always have a margin of error and as such will have some
| form of reliability issues:
| https://open.substack.com/pub/transitions/p/here-is-why-ther...
| rmbyrro wrote:
| the possibility of error is a requirement for AGI
|
| the same foundation that makes the binary model of
| computation so reliable is what also makes it unsuitable to
| solving complex problems with any level of autonomy
|
| in order to reach autonomy and handle complexity, the
| computational model foundation _must_ accept errors
|
| because the real world is _not binary_
| sroussey wrote:
| This really speaks to the endeavors of making non-digital
| hardware for AI. Less of an impedance mismatch.
| kerkeslager wrote:
| > As of right now, we have no way of knowing in advance what
| the capabilities of current AI systems will be if we are able
| to scale them by 10x, 100x, 1000x, and more.
|
| Uhh, yes we do.
|
| I mean sure, we don't know everything, but we know one thing
| which is very important and which isn't under debate by anyone
| who knows how current AI works: current AI response quality
| cannot surpass the quality of its inputs (which include both
| training data and code assumptions).
|
| > The number of neuron-neuron connections in current AI systems
| is still tiny compared to the human brain.
|
| And it's become abundantly clear that this isn't the important
| difference between current AI and the human brain for two
| reasons: 1) there are large scale structural differences which
| contain implicit, inherited input data which goes beyond neuron
| quantity, and 2) as I said before, we cannot surpass the
| quality of input data, and current training data sets clearly
| do not contain all the input data one would need to train a
| human brain anyway.
|
| It's true we don't know _exactly_ what would happen if we
| scaled up a current-model AI to human brain size, but _we do
| know_ that it would _not_ produce a human brain level of
| intelligence. The input datasets we have simply do not contain
| a human level of intelligence.
| petesergeant wrote:
| ... and any other answer is just special pleading towards what
| people want to be true. "What LLMs can't do" is increasingly
| "God of the gaps" -- someone states what they believe to be a
| fundamental limitation, and then later models show that
| limitation doesn't hold. Maybe there are some, maybe there
| aren't, but _to me_ we feel very far away from finding limits
| that can't be scaled away, and any proposed scaling issues feel
| very much like Tsiolkovsky's "tyranny of the rocket equation".
|
| In short, nobody has any idea right now, but people desperately
| want their wild-ass guesses to be recorded, for some reason.
| HarHarVeryFunny wrote:
| > As of right now, we have no way of knowing in advance what
| the capabilities of current AI systems will be if we are able
| to scale them by 10x, 100x, 1000x, and more.
|
| I don't think that's totally true, and anyways it depends on
| what kind of scaling you are talking about.
|
| 1) As far as training set (& corresponding model + compute)
| scaling goes - it seems we do know the answer since there are
| leaks from multiple sources that training set scaling
| performance gains are plateauing. No doubt you can keep
| generating more data for specialized verticals, or keep feeding
| video data for domain-specific gains, but for general text-
| based intelligence existing training sets ("the internet",
| probably plus many books) must have pretty decent coverage.
| Compare to a human: would a college graduate reading one more
| set of encyclopedias make them significantly smarter or more
| capable ?
|
| 2) The _new_ type of scaling is not training set scaling, but
| instead run-time compute scaling, as done by models such as
| OpenAI 's GPT-o1 and o3. What is being done here is basically
| adding something similar to tree search on top of the model's
| output. Roughly: for each of top 10 predicted tokens, predict
| top 10 continuation tokens, then for each of those predict top
| 10, etc - so for a depth 3 tree we've already generated -
| scaled compute/cost by - 1000 tokens (for depth 4 search it'd
| be 10,000 x compute/cost, etc). The system then evaluates each
| branch of the tree according to some metric and returns the
| best one. OpenAI have indicated linear performance gains for
| exponential compute/cost increase, which you could interpret as
| linear performance gains for each additional step of tree depth
| (3 tokens vs 4 tokens, etc).
|
| Edit: Note that the unit of depth may be (probably is)
| "reasoning step" rather than single token, but OpenAI have not
| shared any details.
|
| Now, we don't KNOW what would happen if type 2) compute/cost
| scaling was done by some HUGE factor, but it's the nature of
| exponentials that it can't be taken too far, even assuming
| there is aggressive pruning of non-promising branches.
| Regardless of the time/cost feasibility of taking this type of
| scaling too far, there's the question of what the benefit would
| be... Basically you are just trying to squeeze the best
| reasoning performance you can out of the model by evaluating
| many different combinatorial reasoning paths ... but ultimately
| limited by the constituent reasoning steps that were present in
| the training set. How well this works for a given type of
| reasoning/planning problem depends on how well a solution to
| that problem can be decomposed into steps that the model is
| capable of generating. For things well represented in the
| training set, where there is no "impedance mismatch" between
| different reasoning steps (e.g. in a uniform domain like math)
| it may work well, but in others may well result in "reasoning
| hallucination" where a predicted reasoning step is
| illogical/invalid. My guess would be that for problems where o3
| already works well, there may well be limited additional gains
| if you are willing to spend 10x, 100x, 1000x more for deeper
| search. For problems where o3 doesn't provide much/any benefit,
| I'd guess that deeper search typically isn't going to help.
| amazingamazing wrote:
| fact of the matter is that if AIs externalities were exposed -
| that is massive energy consumption - to end users and humanity in
| general, no one would use it.
| NoGravitas wrote:
| I wish we could get humanity in general to understand
| externalities in general.
| fulafel wrote:
| I think this is wildly optimistic about how environmentally
| conscious customers of LLMs are. People use fossil fuels
| directly and through electricity consumption in a
| unconscionable way at a scale wildly exceeding what a ChatGPT
| user's energy expenditure is.
|
| We desperately need to rapidly regulately down fossils usage
| and production for both electricity generation and transport.
| The rest of the world needs to follow the example of the EU CO2
| emissions policy which guarantees it's progressing at a
| downwards slope independent of what the CO2 emissions are spent
| on.
| billy99k wrote:
| I use it for fast documentation of unknown (to me) APIs and other
| pieces of software. It's saved me hours of time, where I didn't
| have to go through the developers site/documentation and I get
| quickly get example code.
|
| Would I use the code directly in production? No. I always use it
| as an example and write my own code.
| nashashmi wrote:
| The elephant in the room: The user interface problem
|
| We seem to dancing around a problem in the middle of the room
| like an elephant no one is acknowledging, and that is the
| interface to Artificial Intelligence and Generative AI is a place
| that requires several degrees of innovations.
|
| I would argue that the first winning feat of innovation on
| interfacing with AI was the "CHAT BOX". And it works well enough
| for the 40% of use cases. And there is another 20% of uses that
| WE THE PEOPLE can use our imagination (prompt engineering) to
| manipulate the chat box to solve. On this topic, there was an
| article/opinion that said complex LLMs are unnecessary because
| 90% of people don't need it. Yeah. Because the chat box cannot do
| much more that would require heavier LLMs.
|
| Complex AI and large data sets need nicer presentation and
| graphics, more actionable interfaces, and more refined activity
| concepts, as well as metadata that gives information on the
| reliability or usability of generated information.
|
| Things like edit sections of an article, enhance articles,
| simplify articles, add relevant images, compress text to fit in a
| limited space, generate sql data from these reports, refine
| patterns found in a page with supplied examples, remove objects,
| add objects, etc.
|
| Some innovation has to happen in MS Office interfaces. Some
| innovations have to happen in photoshop-like interfaces.
|
| The author is complaining about utopian systems being
| incompatible with AI. I would argue AI is a utopian system being
| used in a dystopian world where we are lacking rich usable
| interfaces.
| vonneumannstan wrote:
| Anyone making big bold claims about what LLMs definitely CAN or
| CANNOT do is FULL OF SHIT. Not even the worlds top experts are
| certain where the limit of these technologies are and we are
| already connecting them to tools, making them Agentic, etc. so
| the era of 'pure' LLM chatbots is already dead imo.
| bentt wrote:
| I don't see how it would because at the end of the day a model is
| like a program... input->output. This seems infinitely useful and
| we are just starting to understand how to use this new way of
| computing.
| ramon156 wrote:
| Aider with claude sonnet is probably all I needed to get my
| programming cycle up to speed. I don't think I want anything more
| as a developer.
|
| That said, it still makes mistakes
| jokethrowaway wrote:
| so maybe you want less mistakes?
| 1oooqooq wrote:
| first winds of winter coming...
| malthaus wrote:
| i'm so confused by these discussions around hitting the wall.
|
| sure, a full-on AGI, non-hallucinating AI would be great. but the
| current state is already a giant leap. there's so much untapped
| potential in the corporate world where whole departments,
| processes, etc can be decimated.
|
| doing this and dealing with the socio-economic and political
| fall-out from those efficiency leaps can happen while research
| (along multiple pathways) goes on, and this will take 5-10 years
| at least.
| mediumsmart wrote:
| _" Nobody's gonna believe that computers are intelligent until
| they start coming in late and lying about it."_
|
| btw: the german KI (Keine Intelligenz) is much more accurate than
| AI (Apparently Intelligent)
| ZiiS wrote:
| Current approches to AI are almost certainly going to be
| superseeded eventually, calling that a dead end achives nothing.
| darioush wrote:
| Just because something is probabilistic in its response doesn't
| mean it's not useable.
|
| There are many probabilistic algorithms and data structures that
| we use daily.
|
| Yes, we don't have developed abstractions to integrate an LLM in
| a programming language, but it doesn't mean no one will make one.
| szundi wrote:
| All branches of everything is a dead end sooner or later in life
| makach wrote:
| Betteridge's law of headlines, _current_ AI may absolutely be a
| dead end, but fortunately technology is evolving and changing -
| who knows what the future will hold.
| andrewguy9 wrote:
| Maybe it is, maybe it isn't. The only thing I know is, none of
| the arrogant fuckers on hacker news know anything about it. But
| that won't stop them from posting.
| tucnak wrote:
| There's an upside! If they're wrong, and they manage to
| convince more people--it basically gives you more of an
| advantage. I don't get into arguments about the utility of LLM
| technology anymore because why bother?
| josefritzishere wrote:
| This may be an exception to Betteridge's law of headlines
| thunkingdeep wrote:
| Useless and dead end aren't synonymous. It's most certainly a
| dead end, but it's also not useless.
|
| There a lot of comments here already conflating these two.
|
| This article is also pretty crap. There's a decent summary box
| but other than that it's all regurgitated half-wisdoms we've all
| already realized: things will change, probably a lot; nobody
| knows what the end goal is or how far we are from it; the next
| quantum leap almost certainly depends on a transcendent
| architecture or new model entirely.
|
| This whole article could've been a single paragraph honestly, and
| a lot of the comments here probably wouldn't have read that
| either... just sayin
| mmaunder wrote:
| "In my mind, all this puts even state-of-the-art current AI
| systems in a position where professional responsibility dictates
| the avoidance of them in any serious application."
|
| And yet here we are with what we all think of as serious and
| seriously useful applications.
|
| "My first 20 years of research were in formal methods, where
| mathematics and logic are used to ensure systems operate
| according to precise formal specifications, or at least to
| support verification of implemented systems."
|
| I think recommending avoiding building anything serious in the
| field until your outdated verification methodology catches up is
| unreasonably cynical, but also naive because it discards the true
| nature of our global society and assumes a lab environment where
| this kind of control is possible.
| rob_c wrote:
| Yes, until someone introduces reward into llm training I doubt
| we'll get much further
| dfilppi wrote:
| Somehow, fallible humans create robust systems. Look to "AI' to
| do the same, at a far higher speed. The "AI" doesn't need to
| recite the Fibonacci sequence; it can write (and test) a program
| that does so. Speed is power.
| quotemstr wrote:
| Whenever a new technology emerges, along with it always emerge
| naysayers who claim that the new technology could never work ---
| _while it 's working right in front of their noses_. I'm sure
| there were people after Kitty Hawk who insisted that heavier than
| air flight would never amount to much economically. Krugman
| famously insisted in the 90s that the internet would never amount
| to anything. These takes are comical in hindsight.
|
| The linked article is another one of these takes. AI can
| _obviously_ reason. o3 is _obviously_ superhuman along a number
| of dimensions. AI is _obviously_ useful for software development.
| This guy spend 20 years of his life working on formal methods. Of
| course he 's going to poo-poo the AI revolution. That doesn't
| make him right.
| sealeck wrote:
| > Whenever a new technology emerges, along with it always
| emerge naysayers who claim that the new technology could never
| work
|
| There's some survivorship bias going on here - you only
| consider technologies which succeeded, and find examples of
| people scrutinising them beforehand. However, we know that not
| every nascent technology blossoms; some are really effective,
| but can't find adopters; some are ahead of their time; some are
| cost-prohibitive; and some are outright scams.
|
| It's not a given that every promising new technology is a
| penicillin - some might be Theranos.
| omolobo wrote:
| > I would call 'LLM-functionalism': the idea that a natural
| language description of the required functionality fed to an LLM,
| possibly with some prompt engineering, establishes a meaningful
| implementation of the functionality.
|
| My boy. More people need common sense like this talked into them.
| mikewarot wrote:
| AI is only a dead end if you expect it to function
| deterministically. In the same way as people, it's not rational,
| and it can't be made rational.
|
| For example, the only effective way to get an AI not to talk
| about Bryan Lunduke is to have an external layer that scans for
| his name in the output of an AI, if found, stops the session and
| prints an error message instead.
|
| If you're willing to build systems around it (like we do with
| people) to limit it's side effects and provide sanity checks, and
| legality checks like those mentioned above, it can offer useful
| opinions about the world.
|
| The main thing to remember is that AI is an _alien_ intelligence.
| Each new model is effectively the product of millions of dollars
| worth of forced evolution. You 're getting Stitch from "Lilo and
| Stitch", and you'll never be sure if it's having a bad day.
| clint wrote:
| Also, is there a known deterministic intelligence? Only very
| specific computer programs can be made deterministic, and even
| that has taken quite a while for us to nail down. A lot of code
| and systems of code produced by humans today is not
| deterministic and it takes a lot of effort to get it there. For
| most people and teams its not even on their radar or worth the
| effort to get it there.
| bloomingkales wrote:
| Control freaks have a serious issue with the incompleteness of
| an LLM. Everyone else is just ecstatic that it gets you 70% of
| the way there often.
| arrosenberg wrote:
| > Control freaks
|
| People who like repeatable results in their work equipment
| are control freaks?
| therein wrote:
| I know, right? Software Engineers with their zeal for
| determinism. How dare they.
| Terr_ wrote:
| Or modern mechanical engineers getting all pissy about
| "tolerances." Look, we shipped you a big box of those
| cheap screws, so just keep trying a different one until
| each motor sticks together.
| jacobgkau wrote:
| > For example, the only effective way to get an AI not to talk
| about Bryan Lunduke is to have an external layer that scans for
| his name in the output of an AI, if found, stops the session
| and prints an error message instead.
|
| > If you're willing to build systems around it (like we do with
| people) to limit it's side effects and provide sanity checks,
|
| I don't think that comparison holds up. We do build systems
| around people, but people also have internal filters, and most
| are able to use them to avoid having to interact with the
| external ones. You seemed to state that AI's don't (can't?)
| have working internal filters and rely on external ones.
|
| Imagine if everyone did whatever they wanted all the time and
| cops had to go around physically corralling literally everyone
| at all times to maintain something vaguely resembling "order."
| That would be more like a world filled with animals than
| people, and even animals have a bit more reasoning than that.
| That's where we are with AI, apparently.
| clint wrote:
| > Imagine if everyone did whatever they wanted all the time
| and cops had to go around physically corralling literally
| everyone at all times to maintain something vaguely
| resembling "order."
|
| I don't need to imagine anything. I live on Earth in America
| and to my mind you've very accurately described the current
| state of human society.
|
| For the vast majority of humans this is how it works
| currently.
|
| The amount of government, military, and police and the
| capital, energy, and time to support all of that in every
| single country on earth is pretty much the only thing holding
| up the facade of "order" that some people seem to take for
| granted.
| jacobgkau wrote:
| > For the vast majority of humans this is how it works
| currently.
|
| No it is not. Like I said, everyone knows everyone has an
| internal "filter" on what you say (and do). The _threat_ of
| law enforcement may motivate everything (if you want to be
| edgy with how you look at it), but that is not the same
| thing as being actively, physically corrected at every
| turn, which is what the analogy in question lines up with.
| peter_retief wrote:
| AI is useful as a tool but it is far from trustworthy.
|
| I just used Grok to write some CRON scripts for me, gave me
| perfectly good results, if you know exactly what you want, it is
| great.
|
| It is not the end of software programmers though and is very
| dangerous to give it too much leeway because you will almost
| certainly end up with problems.
|
| I agree with the conclusion that a hybrid model is possible.
| DeepYogurt wrote:
| > if you know exactly what you want, it is great.
|
| Kinda kills the utility if you need to know what you want out
| tho...
| stanac wrote:
| It speeds up code writing, it's not useless. Best use case
| for me is to help me understand libraries that are sparsely
| documented (e.g. dotnet roslyn api).
|
| edit: spelling
| xandrius wrote:
| If I can get 100 lines generated instantly while explaining
| it in 25, scan the answer just to validate it and then, no
| wait, add other 50 lines as I forgot something before. All
| that in minutes then I'm happy.
|
| Plus I can detach the "tell the AI" part from the actual
| running of the code. That's pretty powerful to me.
|
| For instance, I could be on the train thinking of something,
| chat it over with an LLM, get it where I want and then pause
| before actually copying it into the project.
| dmead wrote:
| Yes. It's really time to move on (to the next scam).
| Hilift wrote:
| >current AI should not be used for serious applications.
|
| "If an an artificial person can do a job and make fewer mistakes
| than a real person, why not?"
|
| Is the question everyone in business is asking.
| mentalgear wrote:
| As evident by most to all the "AI-hiring platforms", it's not
| about solving a problem successfully, but using the latest
| moniker/term/sticker to appear as if you solve the problem
| successfully.
|
| In reality, neither the client nor the user base have access to
| the ground truth of these "AI system"s to determine actual
| reliability and efficiency.
|
| That's not to say there aren't some genuine ML/AGI companies
| like DeepMind (which solve specific narrow problems with quite
| high confidently), but most of the "AI" companies feel like
| they are coming from Crypto and are now selling little more
| than vaporware in the AI gold rush.
| hmillison wrote:
| in reality the question is more so, can the AI do a "good
| enough" job to not be noticeably worse than a real person?
| dghlsakjg wrote:
| > "If an an artificial person can do a job and make fewer
| mistakes than a real person, why not?"
|
| The very simple answer to that is that the artificial person
| can't do the full job of a person yet.
|
| Being good or better _at certain parts_ of a job does not mean
| it can do the whole job effectively.
| sanderjd wrote:
| I always find this to be a false dichotomy. I'm not sure what
| use cases are a good fit for generative AI models to tackle
| without human supervision. But there are clearly many tasks
| where the combination of generative AI with human direction is
| a big productivity boon.
| 015a wrote:
| "Making fewer mistakes" implies that there's a framework within
| which the agent operates where its performance can be quickly
| judged as correct or incorrect. But, computers have already
| automated many tasks and roles in companies where this
| description applies; and competitive companies now remain
| capitalistically competitive not because they have stronger
| automation of boolean jobs, but because they're better
| configured to leverage human creativity in tasks and roles
| performance in which cannot be quickly judged as correct or
| incorrect.
|
| Apple is the world's most valuable company, and many would
| attribute a strong part of their success to Jobs' legacy of
| high-quality decision-making. But anyone who has worked in a
| large company understands that there's no way Apple can so
| consistently produce their wide range of highly integrated,
| high quality products with only a top-down mandate from one
| person; especially a dead one. It takes thousands of people,
| the right people, given the right level of authority, making
| high-quality high-creativity decisions. It also, obviously,
| takes the daily process, an awe-inspiring global supply chain,
| automation systems, and these are areas that computers, and now
| AI, can have a high impact in. But that automation is a
| commodity now. Samsung has access to that same automation, and
| they make fridges and TVs; so why aren't they worth almost four
| trillion dollars?
|
| AI doesn't replace humans; it, like computers more generally
| before it, brings the process cost of the inhuman things it can
| automate to zero. When that cost is zero, AI cannot be a
| differentiating factor between two businesses. The
| differentiating factors, instead, become the capital the
| businesses already have to deploy (favoring of established
| players), and the humans who interact with the AI, interpreting
| and when necessary executing on its decisions.
| jandrese wrote:
| 1979 presentation at IBM:
|
| "A computer can never be held accountable. Therefore, a
| computer must never make a management decision."
|
| There are lots of bullshit jobs that we could automate away, AI
| or no. This is far from a new problem. Our current "AI"
| solutions promise to do it cheaper, but detecting and dealing
| with "hallucinations" is turning out to be more expensive than
| anticipated and it's not at all clear to me that this will be
| the silver bullet that the likes of Sam Altman claims it will
| be.
|
| Even if the AI solution makes fewer mistakes, the magnitude of
| those mistakes matter. The human might make transcription
| errors with patient data or other annoying but fixable clerical
| errors, while the AI may be perfect with transcription but make
| completely sensible sounding but ultimately nonsense diagnosis,
| with dangerous consequences.
| warkdarrior wrote:
| 1953 IBM also thought that "there is a world market for maybe
| five computers," so I am not sure their management views are
| relevant this many decades later.
| simpaticoder wrote:
| _>...developing software to align with the principle that
| impactful software systems need to be trustworthy, which implies
| their development needs to be managed, transparent and
| accountable._
|
| The author severely discounts the value of opacity and
| unaccountability in modern software systems. Large organizations
| previous had to mitigate moral hazard with unreliable and
| burdened-with-conscience labor. LLM style software is superior on
| every axis in this application.
| nbzs wrote:
| I am a simple man. In 2022 I glanced trough Attention is all you
| need and forgot about it. A lot of people made money. A lot of
| people believed that the end of programmers and designers is
| absolute. Some people on the stage announced The death of coding.
| Others bravely explored the future in which people are not needed
| for creative work.
|
| Aside of the anger that this public stupidity produced in me, I
| always knew that this day will come.
|
| Maybe next time someone will have the balls not to call a text-
| generator's with inherent hallucination Intelligence? Who knows.
| Miracles can happen.:)
| qaq wrote:
| To push something to the limit requires a lot of funding if
| public never got overexcited about some tech many really cool
| things would have never being tried. Also LLM are pretty useful
| even as is. It sure made me more productive.
| nbzs wrote:
| I just imagine the world in which the industry defined by
| deterministic nature and facts has the bravery to call spade
| a spade. LLM's have a function. Machine learning also. But
| calling LLM's Intelligence and pushing the hype to overdrive?
| godelski wrote:
| Over hype is what led to the last AI winter. Because they
| created a railroad and didn't diversify. For some reason
| we're doing it again.
| peter_retief wrote:
| In retrospect it seems obvious now.
| uoaei wrote:
| For some. For others with fewer stars in their eyes, it was
| obvious from the beginning.
| dingnuts wrote:
| the launch of ChatGPT had an amount of hype that was
| downright confusing for someone who had previously
| downloaded and fine tuned GPT2. Everyone who hadn't used a
| language model said it was revolutionary but it was
| obviously evolutionary
|
| and I'm not sure the progress is linear, it might be
| logarithmic.
|
| genAI in its current state has some uses.. but I fear that
| mostly ChatGPT is hallucinating false information of all
| kinds into the minds of uninformed people who think GPT is
| actually intelligence.
| uoaei wrote:
| Everyone who actually works on this stuff, and didn't
| have ulterior motives in hyping it up to (over)sell it,
| have been identifying themselves as such and providing
| context for the hype since the beginning.
|
| The furthest they got before the hype machine took over
| was introducing the term "stochastic parrot" to popular
| discourse.
| highfrequency wrote:
| Seems completely nonsensical. Yes, neural networks themselves are
| not unit testable, modular, symbolic or verifiable. That's why we
| have them produce _code artifacts_ - which possess all those
| traits and can be reviewed by both humans and other machines.
| It's completely analogous to human software engineers, who are
| unfortunately black boxes as well.
|
| More broadly, I've learned to attach 0 credence to any conceptual
| argument that an approach will _not_ lead somewhere interesting.
| The hit rate on these negative theories is atrocious, they are
| often motivated by impure reasons, and the downside is very
| asymmetric (who cares if you sidestep a boring path? yet how
| brutal is it to miss an easy and powerful solution?)
| omolobo wrote:
| [flagged]
| highfrequency wrote:
| A software engineer that costs $20/month instead of
| $20k/month, and gets meaningfully more knowledgeable about
| every field on earth each year?
| KronisLV wrote:
| Honestly, I think it's nothing special to say that certain
| technologies have an end point.
|
| We had lots of advancements in single core CPUs but eventually
| more than that was necessary, now the same is happening with
| monolithic chips vs chiplet designs.
|
| Same for something like HTTP/1.1 and HTTP/2 and now HTTP/3.
|
| Same for traditional rendering vs something like raytracing and
| other approaches.
|
| I assume it's the same for typical spell checking and writing
| assistants vs LLM based ones.
|
| That it's the same for typical autocomplete solutions vs LLM
| based ones.
|
| It does seem that there weren't former _technological_ solutions
| for images /animations/models etc. (maybe the likes of Mixamo and
| animation retargeting, but not much for replacing a concept
| artist for shops that can't afford one).
|
| Each technology, including the various forms of AI have their
| limitations, with the exception of how much money has been spent
| on training the likes of models behind ChatGPT etc. Nothing wrong
| with that, I'll use LLMs what they're good for and look for
| something else once new technologies become available.
| seydor wrote:
| There is no reason to privilege compositionality/modularity vs
| emergence. One day we may have the emergence of compositionality
| in a large model. It would be a dead end if this was probably not
| possible
| Gud wrote:
| What exactly does current "AI" do?
|
| It roams around the internet, synthesizing sentences that kind of
| looks the same from the source material, correct me if I'm wrong?
| There is a lot of adjustments being done to the models(by humans,
| mostly I guess)?
|
| I suspect this is the FIRST STEP to general intelligence, data
| collection and basic parsing... I suspect there is not a thing
| called "reasoning" - but a multi step process... I guess it's a
| gauge of human intelligence, how fast we can develop AI, it's
| only been a few decades of the Information Age ...
| resoluteteeth wrote:
| > I suspect this is the FIRST STEP to general intelligence,
| data collection and basic parsing... I suspect there is not a
| thing called "reasoning" - but a multi step process... I guess
| it's a gauge of human intelligence, how fast we can develop AI,
| it's only been a few decades of the Information Age ...
|
| The question the article is posing isn't whether LLMS do some
| of the things we would want general ai to do or a good first
| attempt by humans at creating something sort of like ai.
|
| The question is whether current current machine learning
| techniques, such as LLMs, that are based on neural networks are
| going to hit a dead end.
|
| I don't think that's something anyone can answer for sure.
| AnimalMuppet wrote:
| LLMs, _by themselves_ , are going to hit a dead end. They are
| not enough to be an AGI, or even a true AI. The question is
| whether LLMs can be a part of something bigger. That, as you
| say, is not something anyone can currently answer for sure.
| jandrese wrote:
| I've come around to thinking of our modern "AI" as a lossy
| compression engine for knowledge. When you ask a question it is
| just decompressing a tiny portion of the knowledge and
| displaying it for you, sometimes with compression artifacts.
|
| This is why I am not worried about the "AI Singularity" like
| some notable loudmouth technologists are. At least not with our
| current ML technologies.
| Gud wrote:
| Absolutely agree.
| red75prime wrote:
| With a bit (OK, a lot) of reinforcement learning that
| prioritizes the best chains-of-thoughts, this compression
| engine becomes a generator of missing training data on how to
| actually think about something instead of trying to come up
| with the answer right away as internet text data suggests it
| should do.
|
| That's the current ML technology. What you've described is
| the past. About 4 year old past to be precise.
| cruffle_duffle wrote:
| That is exactly how I think about it. It's lossy compression.
| Think about how many petabytes of actual information any of
| these LLMs were trained on. Now look at the size of the
| resultant model. Its orders of magnitude smaller. It made it
| smaller by clipping the high frequency bits of some multi-
| billion dimension graph of knowledge. Same basic you do with
| other compression algorithms like JPEG or MP3.
|
| These LLM's are just lossy compression for knowledge. I think
| the sooner that "idea" gets surfaced people will find ways to
| train models with fixed pre-computed lookup tables of
| knowledge categories and association properties... basically
| taking a lot of the randomness out of the training process
| and getting more precise about what dimensions of knowledge
| and facts are embedded into the model.
|
| ... or something like that. But I don't think this
| optimization will be driven by the large well funded tech
| companies. They are too invested in flushing money down the
| drain with more and more compute. Their huge budget blind
| them to other ways of doing the same thing with significantly
| less.
|
| The future won't be massive large language models. They'll be
| "small language models" custom tuned to specific tasks.
| You'll download or train a model that has incredible
| understanding of Rust and Django but won't know a single
| thing about plate tectonics or apple pie recipes.
| wodderam wrote:
| Why wouldn't we have a small language model for python
| programming now though?
|
| That is an obvious product. I would suspect the reason we
| don't have a small language python model is because the
| fine tuned model is no better than the giant general
| purpose model.
|
| If that is the case it is not good. It even makes me wonder
| that we are not really compressing knowledge but a hack to
| create the illusion of compressing knowledge.
| fetas wrote:
| Ye Dr r
| trane_project wrote:
| I think most AI research up to this day is a dead end. Assuming
| that intelligence is a problem solvable by computers implies that
| intelligence is a computable function. Nobody up to this day has
| been able to give a formal mathematical definition of
| intelligence, let alone a proof that it can be reduced to a
| computable function.
|
| So why assume that computer science is the key to solving a
| problem that cannot even be defined in terms of math? We had
| formal definitions of computers decades before they became a
| reality, but somehow cannot make progress in formally defining
| intelligence.
|
| I do think artificial intelligence can be achieved by making
| artificial intelligence a multidiscipline endeavor with
| biological engineering at its core, not computer science. See the
| work of Michael Levin to see real intelligence in action:
| https://www.youtube.com/watch?v=Ed3ioGO7g10
| leesec wrote:
| Marcus Hutter did
| Xunjin wrote:
| Could you point out where we could find the related info?
| trane_project wrote:
| Thanks for pointing me out to this. This is a proposed
| definition of intelligence. Is it the same as the real thing,
| though? Even assuming that it was:
|
| > Like Solomonoff induction, AIXI is incomputable.
|
| That would mean that computers can, at best, produce an
| approximation. We know the real thing exists in nature
| though, so why not take advantage of those competencies?
| lanza wrote:
| > Nobody up to this day has been able to give a formal
| mathematical definition of intelligence, let alone a proof that
| it can be reduced to a computable function.
|
| We can't prove the correctness of the plurality of physics.
| Should we call that a dead end too?
| Davidbrcz wrote:
| This is actually a philosophical question !
|
| If you believe in functionalism (~mental states are identified
| by what they do rather than by what they are made of), then
| current AI is _not_ a dead end.
|
| We wouldn't need to define intelligence, just make it big and
| efficient enough to replicate what's currently existing would
| be intelligence by that definition.
| trane_project wrote:
| My point is that if you use biological cells to drive the
| system, which already exhibit intelligent behaviors, you
| don't have to worry about any of these questions. The basic
| unit you are using is already intelligent, so it's a given
| that the full system will be intelligent. And not an
| approximation but the real thing.
| red75prime wrote:
| Humanities? Have you chosen humanities as electives?
| hanniabu wrote:
| Current AI should be referred to as collective intelligence
| since it needs to be trained and only knows what's been written
| leesec wrote:
| Incredibly ignorant set of replies on this thread lol. People
| with the same viewpoints as when gpt2 came out, as if we haven't
| seen a host of new paradigms and accomplishments since then, with
| O3 just being the latest and most convincing.
| logicchains wrote:
| Let them have their fun while they can, it's gonna get pretty
| bleak in the next 5-10 years when coding jobs are being
| replaced left and right by bots that can do the work better and
| cheaper.
| boshalfoshal wrote:
| Maybe you're retired or not a SWE or knowledge worker
| anymore, but I have a decent amount of concern about this
| future.
|
| As a society, we have not even begun to think about what
| happens when large swathes of the population become
| unemployed. Everyone says they'd love to not work, but no one
| says they can survive without money. Our society trades labor
| for money. And I have very little faith in our society or the
| government to alleviate this through something like UBI.
|
| Previously it was physical work that was made more efficient,
| but the one edge we thought we would always have as humans -
| our creativity and thinking skills - is also being displaced.
| And that too, its fairly clear that the leaders in the space
| (apart from maybe Anthropic?) are doing this purely from a
| capitalist driven profit first motivation.
|
| I for one think the world will be a worse place for a few
| years immediately after AGI/ASI.
| asdff wrote:
| Why not just hire out of bangladesh though?
| mrshadowgoose wrote:
| It's deeply saddening to see how fixated people are on the
| here-and-now, while ignoring the terrifying rate of progress,
| and its wide-ranging implications.
|
| We've gone from people screeching "deep learning has hit its
| limits" in 2021 to models today that are able to reason within
| limited, but economically relevant contexts. And yet despite
| this, the same type of screeching continues.
| martindbp wrote:
| It's the same kind of people who claimed human flight would
| not be possible for 10,000 years in 1902. I just can't
| understand how narrow your mind has to be in order to be this
| skeptic.
| sealeck wrote:
| Or the same kind of people who claimed Theranos was a scam,
| or that AI in the 70s wasn't about to produce Terminator
| within a few years, or that the .com bubble was in fact a
| bubble...
| zeroonetwothree wrote:
| Maybe some of us aren't actually impressed with the
| "progress" since 2022? Doing well at random benchmarks hasn't
| noticeably improved capability in use for work.
|
| Does that mean it will never improve? Of course not. But
| don't act like everyone else is some kind of moron.
| dyauspitr wrote:
| They're scared (as am I) but I have no illusions about the
| usefulness of these LLMs. Everyone on my team uses them to get
| their tickets done in a fraction of the time and then just sit
| around till the sprint ends.
| aerhardt wrote:
| The innovation in foundational models is far outpacing the
| applications. Other than protein folding (which is not only
| LLMs AFAIK) I haven't seen a single application that blows my
| mind. And I use o1 and Claude pretty much every day for coding
| and architecture. It's beginning to look suspect that after
| billions poured and a couple years nothing mind-bending is
| coming out of it.
| spoaceman7777 wrote:
| Yeah, sounds like people are encountering a lot of PEBCAK
| errors in this thread. You get out of LLMs what you put into
| them, and the complaints, at this point, are more an admission
| of an inability to learn the new tools well.
|
| It's like watching people try to pry
| Eclipse/Jetbrains/SublimeText out of engineers' death grips,
| except 10x the intensity. (I still use Jetbrains fyi :p)
| boshalfoshal wrote:
| Well thats the argument most people here are making - that
| current LLMs are not good enough to be fully autonomous
| precisely because a human operator has to "put the right
| thing into them to get the right thing out."
|
| If I'm spending effort specifying a problem N times in very
| specific LLM-instruction-language to get the correct output
| for some code, I'd rather just write the code myself. After
| all, thats what code is for. English is lossy, code isn't. I
| can see codegen getting even better in larger organizations
| if context windows are large enough to have a significant
| portion of the codebase in it.
|
| There are areas where this is immediately better in though
| (customer feedback, subjective advice, small sections of
| sandboxed/basic code, etc). Basically, areas where the
| effects of information compression/decompression can be
| tolerated or passed onto the user to verify.
|
| I can see all of these getting better in a couple of
| months/few years.
| roody15 wrote:
| What I find interesting is current LLM's are based primarily on
| written data which is already an abstraction / abbreviation of
| most observed phenomenon.
|
| What happens when AI starts to send out it own drones or perhaps
| robots and tries to gather and train based on data it observes
| itself.
|
| I think we may be closer to this point than we realize... results
| of AI could get quite interesting once a human level abstraction
| of knowledge is perhaps reduced.
| 383toast wrote:
| Wouldn't the work on interpretability would solve these concerns?
| abeppu wrote:
| > Eerke Boiten, Professor of Cyber Security at De Montfort
| University Leicester, explains his belief that current AI should
| not be used for serious applications.
|
| > In my mind, all this puts even state-of-the-art current AI
| systems in a position where professional responsibility dictates
| the avoidance of them in any serious application.
|
| > Current AI systems also have a role to play as components of
| larger systems in limited scopes where their potentially
| erroneous outputs can be reliably detected and managed, or in
| contexts such as weather prediction where we had always expected
| stochastic predictions rather than certainty.
|
| I think it's important to note that:
|
| - Boiten is a security expert, but doesn't have a background
| working in ML/AI
|
| - He never defines what "serious application" means, but
| apparently systems that are designed to be tolerant of missed
| predictions are not "serious".
|
| He seems to want to trust a system at the same level that he
| trusts a theorem proved with formal methods, etc.
|
| I think the frustrating part of this article is that from a
| security perspective, he's probably right about his
| recommendations, but he seems off-base in the analysis that gets
| him there.
|
| > Current AI systems have no internal structure that relates
| meaningfully to their functionality. They cannot be developed, or
| reused, as components.
|
| Obviously AI systems _do_ have internal structure, and there are
| re-usable components both at the system level (e.g. we pick an
| embedding, we populate some vector DB with contents using that
| embedding, and create a retrieval system that can be used in
| multiple ways). The architecture of models themselves also has
| components which are reused, and we make choices about when to
| keep them frozen versus when to retrain them. Any look at
| architecture diagrams in ML papers shows one level of these
| components.
|
| > exponential increases of training data and effort will give us
| modest increases in impressive plausibility but no foundational
| increase in reliability.
|
| I think really the problem is that we're fixated on mostly-
| solving an ever broader set of problems rather than solving the
| existing problems more reliably. There's plenty of results about
| ensembling and learning theory that give us a direction to
| increase increase reliability (by paying for more models of the
| same size), but we seem far more interested in seeing if we can
| most of the time solve problems at a higher level of
| sophistication. That's a choice that we're making. Similarly
| Boiten mentions the possibility of models with explicit
| confidences -- and there's been plenty of work on that but b/c
| there's a tradeoff with model size (i.e. do you want to spend
| your resources on a bigger model, or on explicitly representing
| variance around a smaller set number of parameters?) but people
| seem mostly uninterested.
|
| I think there are real reasons to be concerned about the specific
| path we're on, but these aren't the good ones.
| saltysalt wrote:
| I think it does represent a dead end, but not for the reasons
| presented in this article.
|
| The real issue in my opinion is that we will hit practical limits
| with training data and computational resources well before AGI
| turns us all into paperclips, basically there is no "Moore's Law"
| for AI and we are already slowing down using existing models like
| GPT.
|
| We are in the vertical scaling phase of AI model development,
| which is not sustainable long-term.
|
| I discussed this further here for anyone interested:
| https://techleader.pro/a/658-There-is-no-Moore's-Law-for-AI-...
| NoGravitas wrote:
| > The real issue in my opinion is that we will hit practical
| limits with training data and computational resources well
| before AGI turns us all into paperclips [...]
|
| I think you are correct, but also I think that even if that
| were not the case, the Thai Library Problem[1] strongly
| suggests that AGI will have to be built on something other than
| LLMs (even if LLM-derived systems were to serve as an interface
| to such systems).
|
| [1]: https://medium.com/@emilymenonbender/thought-experiment-
| in-t...
| ItCouldBeWorse wrote:
| Should replace it with head cheese:
| http://www.technovelgy.com/ct/content.asp?Bnum=687
| slow_typist wrote:
| Why is it that LLMs are 'stochastic', shouldn't the same input
| lead to the same output? Is the LLM somehow modifying itself in
| production? Or is it just flipping bits caused by cosmic
| radiation?
| ijustlovemath wrote:
| They probabilistically choose an output. Check out 3b1b's
| series on LLMs for a better understanding!
| fnl wrote:
| For Mixture of Expert models (like GPTs are), they can produce
| different results for an input sequence if that sequence is
| retried together with a different set of sequences in its
| inference batch, because of the model ("expert") routing
| depends on the batch, not the single sequence:
| https://152334h.github.io/blog/non-determinism-in-gpt-4/
|
| And in general, binary floating point arithmetic cannot
| guarantee associativity - i.e. `(a + b) + c` might not be the
| same as `a + (b + c)`. That in turn can lead to the model
| picking another token in rare cases (and it's auto-regressive
| consequences, that the entire remainder of the generated
| sequence might differ): https://www.ingonyama.com/blog/solving-
| reproducibility-chall...
|
| Edit: Of course, my answer assumes you are asking about the
| case when the model lets you set its token generation
| temperature (stochasticity) to exactly zero. With default
| parameter settings, all LLMs I know of randomly pick among the
| best tokens.
| sroussey wrote:
| They always return the same output for the same input. That is
| how tests are done for llama.cpp, for example.
|
| To get variety, you give each person a different seed. That way
| each user gets consistent answers but different than each
| other. You can add some randomness in each call if you don't
| want the same person getting the same output for the same
| input.
|
| It would be impossible to test and benchmark llama.cpp et al
| otherwise!
|
| By the time you get to a UI someone has made these decisions
| for you.
|
| It's just math underneath!
|
| Hope this helps.
| ohxh wrote:
| "One could offer so many examples of such categorical prophecies
| being quickly refuted by experience! In fact, this type of
| negative prediction is repeated so frequently that one might ask
| if it is not prompted by the very proximity of the discovery that
| one solemnly proclaims will never take place. In every period,
| any important discovery will threaten some organization of
| knowledge." Rene Girard, Things Hidden Since the Foundation of
| the World, p. 4
| juped wrote:
| It's only a dead end if you bought the hype; it's actually
| useful! Just not all-powerful.
| everdrive wrote:
| I hope so because I'm extraordinarily sick of the technology. I
| can't really ask a question at work without some jackass posting
| an LLM answer in there. The answers almost never amount to
| anything useful, but no one can tell since it looks clearly
| written. They're "participating" but haven't actually done
| anything worthwhile.
| yashap wrote:
| I hope so, but for different reasons. Agreed they spit out
| plenty of gibberish at the moment, but they've also progressed
| so far so fast it's pretty scary. If we get to a legitimate
| artificial general super intelligence, I'm about 95% sure that
| will be terrible for the vast, vast majority of humans, we'll
| be obsolete. Crossing my fingers that the current AI surge
| stops well short of that, and the push that eventually does get
| there is way, way off into the future.
| youssefabdelm wrote:
| Or liberating... as Douglas Rushkoff puts it.
|
| If and only if something like high-paying UBI comes along,
| and people are freed to pursue their passions and as a
| consequence, benefit the world much more intensely.
| amarcheschi wrote:
| everything points to the opposite
| youssefabdelm wrote:
| It may be impossible in this world to expect a form of
| donation, but it is certainly not impossible to expect
| forms of investment.
|
| One idea I had is everyone is paid a thriving wage, and
| in exchange, if they in the future develop their passion
| into something that can make a profit, they pay back 20%
| of their profits they make up to some capped amount.
|
| This allows for extreme generality. It truly frees people
| to pursue whatever they fancy every day until they catch
| lightning in a bottle.
|
| There would be 0 obligation as to _what_ to do, and when
| to pay back the money. But of course would have to be
| only open to honest people, so that neither side is
| exploiting the other.
|
| Both sides need a sense of gratitude, and wanting to give
| back. A philanthropic 'flair' "If it doesn't work out,
| it's okay", and a gratitude and wanting to give back
| someday on the side of the receiver, as they continue
| working on probably the most resilient thing they could
| ever work on (the safest investment), their lifelong
| passion.
| gershy wrote:
| I'm not sure passion exists in a world without struggle...
| kerkeslager wrote:
| The idea that AI will _ever_ remove _all_ struggle, even
| if it reaches AGI, is absurd. AI by itself can 't give
| you a hug, for example--and even if advances in robotics
| make it possible for an AI-controlled _robot_ to do that,
| there are dozens of unsolved problems beyond that to make
| that something that most people would even want.
|
| AI enthusiasm really is reaching a religious level of
| ridiculous beliefs and this point.
| smallmancontrov wrote:
| "I only make you struggle because I love you!"
|
| (Mmmhmm, I'm sure the benefits received by the people on
| top have nothing to do with it.)
| diego_sandoval wrote:
| I'm not sure if that is something we actually would want.
|
| Lots of people certainly think they want that.
| Hasu wrote:
| Why wouldn't you want it, unless you are currently
| benefiting from employing people who would rather be
| doing literally anything else?
| szundi wrote:
| Even then he'll probably like employing AI more.
|
| Lots of new taxes and UBI!
| hackinthebochs wrote:
| For the vast majority of people, getting rid of necessary
| work will usher in an unprecedented crisis of meaning.
| Most people aren't the type pursue creative ends if they
| didn't have to work. They would veg out or engage in
| degenerate activities. Many people have their identity
| wrapped up in the work they do, or being a provider. Take
| this away without having something to replace it with
| will be devastating.
| pixl97 wrote:
| >They would veg out or engage in degenerate activities
|
| "Oh no the sinners might play video games all day"
|
| I do expect the next comment would be something like
| "work is a path to godliness"
| hackinthebochs wrote:
| >I do expect the next comment would be something like
| "work is a path to godliness"
|
| And you think these kinds of maxims formed out of
| vacuums? They are the kinds of sayings that are formed
| through experience re-enforced over generations. We can't
| just completely reject all historical knowledge encoded
| in our cultural maxims and expect everything to work out
| just fine. Yes, it is true that most people not having
| productive work will fill the time with frivolous or
| destructive ends. Modernity does not mean we've somehow
| transcended our historical past.
| EMIRELADERO wrote:
| > They are the kinds of sayings that are formed through
| experience re-enforced over generations.
|
| Sure, but the whole point is that the conditions that led
| to those sayings would no longer be there.
|
| Put a different way: those sayings and attitudes were
| necessary in the first place because society needed
| people to work in order to sustain itself. In a system
| where individual human work is no longer necessary, of
| what use is that cultural attitude?
| hackinthebochs wrote:
| It wasn't just about getting people to work, but keeping
| people from degenerate and/or anti-social behavior.
| Probably the single biggest factor in the success of a
| society is channeling young adult male behavior towards
| productive ends. Getting them to work is part of it, but
| also keeping them from destructive behavior. In a world
| where basic needs are provided for automatically, status-
| seeking behavior doesn't evaporate, it just no longer has
| a productive direction that anyone can make use of. Now
| we have idle young men at the peak of their status-
| seeking behavior with little productive avenues available
| to them. It's not hard to predict this doesn't end well.
|
| Beyond the issues of young males, there's many other ways
| for degenerate behavior to cause problems. Drinking,
| gambling, drugs, being a general nuisance, all these
| things will skyrocket if people have endless time to
| fill. Just during the pandemic, we saw the growth of
| roving gangs riding ATVs in some cities causing a serious
| disturbance. Some cities now have a culture of teenagers
| hijacking cars. What happens to these people who are on
| the brink when they no longer see the need to go to
| school because their basic needs are met? Nothing good,
| that's for sure.
| EMIRELADERO wrote:
| What exactly do you think would happen? Usually wars are
| about resources. When resource distribution stops being a
| problem (i.e, anyone can live like a king just by
| existing), where exactly does a problem manifest?
|
| All the "degenerate activities" you mentioned are a
| problem in the first place because in a scarcity-based
| society they slow down/prevent people from working,
| therefore society is worse off. That logic makes no sense
| in a world where people don't need to put a single drop
| of effort for society to function well.
| hackinthebochs wrote:
| >All the "degenerate activities" you mentioned are a
| problem in the first place because in a scarcity-based
| society they slow down/prevent people from working
|
| This is a weird take. Families are worse off if a parent
| has an addiction because it potentially makes their lives
| a living hell. Everyone is worse off if people feel
| unsafe because of a degenerate sub-culture that glorifies
| things like hijacking cars. People who don't behave in
| predictable ways create low-trust environments which
| impacts everyone.
| Hasu wrote:
| > And you think these kinds of maxims formed out of
| vacuums?
|
| Do you think they've always existed in all human cultures
| throughout time?
|
| The pro-work ethic is fairly new in human civilization.
| Previous cultures considered it to be a burden or
| punishment, not the source of moral virtue.
|
| > Yes, it is true that most people not having productive
| work will fill the time with frivolous or destructive
| ends.
|
| And that's fine! A lot of people fill their time at work
| with frivolous or destructive ends, whether on their own
| or at the behest of their employer.
|
| Not all work is productive. Not all work is good. It
| isn't inherently virtuous and its lack is not inherently
| vicious.
| cortesoft wrote:
| > And you think these kinds of maxims formed out of
| vacuums?
|
| No, they formed in societies where it WAS necessary for
| most people to work in order to support the community. We
| needed a lot of labor to survive, so it was important to
| incentivize people to work hard, so our cultures
| developed values around work ethics.
|
| As we move more and more towards a world where we
| actually don't need everyone to work, those moral values
| become more and more outdated.
|
| This is just like old religious rules around eating
| certain foods; in the past, we were at risk from a lot of
| diseases and avoiding certain foods was important for our
| health. Now, we don't face those same risks so many
| people have moved on from those rules.
| hackinthebochs wrote:
| >those moral values become more and more outdated.
|
| Do you think there was ever a time in human societies
| where the vast majority of people didn't have to "work"
| in some capacity, at least since the rise of
| psychologically modern humans? If not, why think humanity
| as a whole can thrive in such an environment?
| cortesoft wrote:
| Our environment today is completely different that it was
| even 100 years ago. Yes, you have to ask this question
| for every part of modern society (fast travel,
| photographs, video, computers, antibiotics, vaccines,
| etc), so I am not sure why work is different.
| hackinthebochs wrote:
| Part of the problem is that we don't ask these questions
| when we should be. Social media, for example, represents
| a unique assault on our psychological makeup that we just
| uncritically unleashed on the world. We're about to do it
| again, likely with even worse consequences.
| youssefabdelm wrote:
| Good. Finally they'll realize the meaninglessness of
| their work and how they've been exploited in the most
| insidious way. To the point of forgetting to answer the
| question of what it is they most want to do in life.
|
| The brain does saturate eventually and gets bored. Then
| the crisis of meaning. Then something meaningful emerges.
|
| We're all gonna die. Let's just enjoy life to the
| fullest.
| gehwartzen wrote:
| The way most of the world is setup we will need to first
| address the unprecedented crisis of financing our day to
| day lives. We figure that out and I'm sure people will
| find other sources of meaning in their lives.
|
| The people that truly enjoy their work and obtain meaning
| from it are vastly over represented here on HN.
|
| Very few would be scared of AI if they had a financial
| stake in its implementation.
| drdaeman wrote:
| That requires achieving post-scarcity to work in practice
| and be fair, though. If achievable, it's not clear how it
| relates to AGI. I mean, there's plenty of intelligence on
| this planet already, and resources are still limited - and
| it's not like AGI would somehow change that.
| roboboffin wrote:
| One thing I thought recently, is that a large amount of
| work is currently monitoring and correcting human
| activity. Corporate law, accounting, HR and services etc.
| If we have AGI that is forced to be compliant, then all
| these businesses disappear. Large companies are suddenly
| made redundant, regardless of whether they replace their
| staff with AI or not.
| drdaeman wrote:
| I agree that if true AGI happens (current systems still
| cannot reason at all, only pretend to do so) and if it
| comes out cheaper to deploy and maintain, that would mean
| a lot of professions could be automated away.
|
| However, I believe this had already happened quite a few
| times in history - industries becoming obsolete with
| technological advances isn't anything new. This creates
| some unrest as society needs to transition, but those
| people are always learning a different profession. Or
| retire if they can. Or try to survive some other way
| (which is bad, of course).
|
| It would be nice, of course, if everyone won't have to
| work unless they feel the need and desire to do so. But
| in our reality, where the resources are scarce and their
| distribution in a way that everyone will be happy is a
| super hard unsolved problem (and AGI won't help here -
| it's not some Deus ex Machina coming to solve world
| problems, it's just a thinking computer), I don't see a
| realistic and fair way to achieve this.
|
| Put simply, all the reasons we cannot implement UBI now
| will still remain in place - AGI simply won't help with
| this.
| shadowerm wrote:
| How can one not understand that UBI is captured by
| inflation.
|
| Its just a modern religion really because anyone can
| understand this it is so basic and obvious.
|
| You don't have to point out some bullshit captured study
| that says otherwise.
| Aerroon wrote:
| Inflation is a lack of goods for a given demand though.
| Ie if we can flood the world with cheap goods then
| inflation won't happen. That would make practical UBI
| possible. To some extent it has already happened.
| NumberWangMan wrote:
| My intuition, based on what I know of economics, is that
| a UBI policy would have results something like the
| following:
|
| * Inflation, things get more expensive. People attempt to
| consume more, especially people with low income. * People
| can't consume more than is produced, so prices go up. *
| People who are above the break-even line (when you factor
| in the taxes) consume a bit less, or stay the same and
| just save less or reduce investments. * Producers, seeing
| higher prices, are incentivized to produce more.
| Increases in production tend to be concentrated toward
| the things that people who were previously very income-
| limited want to buy. I'd expect a good bit of that to be
| basic essentials, but of course it would include lots of
| different things. * The system reaches a new equilibrium,
| with the allocation of produced goods being a bit more
| aimed toward the things regular people want, and a bit
| less toward luxury goods for the wealthy. * Some people
| quit work to take care of their kids full-time. The
| change in wages of those who stay working depends heavily
| on how competitive their skills are -- some earn less,
| but with the UBI still win out. Some may actually get
| paid more even without counting the UBI, if a lot of
| workers in their industry have quit due to the UBI, and
| there's increased demand for the products. * Prices have
| risen, but not enough to cancel out one's additional UBI
| income entirely. It's very hard to say how much would be
| eaten up by inflation, but I'd expect it's not 10% or
| 90%, probably somewhere in between. Getting an accurate
| figure for that would take a lot of research and
| modeling.
|
| Basically, I think it's complicated, with all the second
| and third-order effects, but I can't imagine a situation
| where so much of the UBI is captured by inflation that it
| makes it pointless. I do think that as a society, we
| should be morally responsible for people who can't earn a
| living for whatever reason, and I think UBI is a better
| system than a patchwork of various services with onerous
| requirements that people have to put a lot of effort into
| navigating, and where finding gainful employment will
| cause you to lose benefits.
| smallmancontrov wrote:
| It doesn't have to be super, it just has to inflect the long
| term trend of labor getting less relevant and capital getting
| more relevant.
|
| We've made an ideology out of denying this and its
| consequences. The fallout will be ugly and the adjustment
| will be painful. At best.
| cactusplant7374 wrote:
| I think of ChatGPT as a faster Google or Stackoverflow and
| all of my colleagues are using it almost exclusively in this
| way. That is still quite impressive but it isn't what Altman
| set out to achieve (and he admits this quite candidly).
|
| What would make me change my mind? If ChatGPT could take the
| lead on designing a robot through all the steps: design,
| contract the parts and assembly, market it, and sell it that
| would really be something.
|
| I assume for something like this to happen it would need all
| source code and design docs from Boston Dynamics in the
| training set. It seems unlikely it could independently make
| the same discoveries on its own.
| randmeerkat wrote:
| > I assume for something like this to happen it would need
| all source code and design docs from Boston Dynamics in the
| training set. It seems unlikely it could independently make
| the same discoveries on its own.
|
| No, to do this it would need to be able to independently
| reason, if it could do that, then the training data stops
| mattering. Training data is a crutch that makes these algos
| appear more intelligent than they are. If they were truly
| intelligent they would be able to learn independently and
| find information on their own.
| markus_zhang wrote:
| It's already impacting some of us. I hope it never appears
| until the human civilization undergoes a profound change. But
| I'm afraid many rich people want that happen.
|
| It's the real Great Filter in the universe IMO.
| rbetts wrote:
| I believe (most) people contribute their ambitions to nurture
| safe, peaceful, friend-filled communities. AGI won't obsolete
| those human desires. Hopefully we weather the turbulence that
| comes with change and come out the other side with new tools
| that enable our pursuits. In the macro, that's been the case.
| I am grateful to live in a time of literacy, antibiotics,
| sanitation, electricity... and am optimistic that if AGI
| emerges, it joins that list of human empowering creations.
| szundi wrote:
| Wise words, thank you.
| jeezfrk wrote:
| Current AI degrades totally unlike human experts. It also, by
| design, must lag its data input.
|
| Anything innovated must come from outside or have a very
| close permutation to be found.
|
| Generative AI isn't scary at all now. It is merely rolling
| dice on a mix of other tech and rumors from the internet.
|
| The data can be wrong or old...and people keep important
| secrets.
| hmottestad wrote:
| Gotta wonder if Google has used code from internal systems
| to train Gemini? Probably not, but at what point will
| companies start forking over source code for LLM training
| for money?
| throwuxiytayq wrote:
| It seems much cheaper, safer legally and more easily
| scalable to simply synthesize programs. Most code out
| there is shit anyway, and the code you can get by the GB
| especially so.
| guerrilla wrote:
| > I'm about 95% sure that will be terrible for the vast, vast
| majority of humans, we'll be obsolete.
|
| This isn't a criticism of you, but this is a very stupid idea
| that we have. The economy is mean to _serve_ _us_. If it can
| 't, we need to completely re-organize it because the old
| model has become invalid. We shouldn't exist to serve the
| economy. That's an absolutely absurd idea that needs to be
| killed in every single one of us.
| insane_dreamer wrote:
| > we need to completely re-organize it because the old
| model has become invalid
|
| that's called social revolution, and those who benefit from
| the old model (currently that would be the holders of
| capital, and more so as AI grows in its capabilities and
| increasingly supplants human labor) will do everything in
| their power to prevent that re-organization
| jprete wrote:
| The economy isn't meant to serve us. It's an emergent
| system that evolves based on a complex incentive structure
| and its own contingent history.
| guerrilla wrote:
| Economic activity is meant to serve us. Don't be a
| pedant.
| baq wrote:
| Nevertheless the modern economy has been deliberately
| designed. Emergent behaviors within it at the highest
| levels are actively monitored and culled when deemed not
| cost effective or straight out harmful.
| hackinthebochs wrote:
| This doesn't engage with the problem of coordinating
| everyone around some proposed solution and so is useless.
| Yes, if we could all just magically decide on a better
| system of government, everything would be great!
| guerrilla wrote:
| Identifying the problem is never useless. We need the
| right understanding if we're going to move forward.
| Believing we serve the economy and not the other way
| around hinders any progress on that front and so
| inverting it is a solid first step.
| achierius wrote:
| Very true but the question, as always, is by what means we
| can enact this change? The economy may well continue to
| serve the owner class even if all workers are replaced with
| robots.
| guerrilla wrote:
| I think the options are pretty clear. A negotiation of
| gradual escalation: Democracy, protests, civil
| disobedience, strikes, sabotage and if all else fails
| then at some point, warfare.
| BurningFrog wrote:
| Workers have been replaced with machines many times over
| the last 250 years, and these fears have always been
| widespread, but never materialized.
|
| I concede that this time it _could_ be different, but I
| 'd be very surprised while I starved to death.
| fny wrote:
| The problem is no one is talking about this. We're clearly
| headed towards such a world, and it's irrelevant whether
| this incarnation will completely achieve that.
|
| And anyone who poo poos ChatGPT needs to remember we went
| from "this isn't going to happen in the next 20 years" to
| "this is happening tomorrow" overnight. It's pretty obvious
| I'm going to be installing Microsoft Employee Service Pack
| 2 in my lifetime.
| kbr- wrote:
| The economy is meant to serve some people; some people take
| out of economy more than they give, some people give more
| than they take.
| llamaz wrote:
| A position shared by both Lenin and Thatcher
| danielovichdk wrote:
| Great theory. In reality the vast majority us serves only
| the economy without getting anything truly valuable in
| return. We serve it only, with noticing it, to grow into
| less human and more individual shells of less human.
| Machines of the Economy.
| accra4rx wrote:
| think more deeply . who benefits with super intelligence ? at
| the end it is game of what humans desire naturally. AI has no
| incentive and are not controlled by hormones.
| jay_kyburz wrote:
| It's not _that_ scary. I kind of like the idea of going out
| to the country and building a permiculture garden to feed
| myself and my family.
| wiml wrote:
| Until you try and you find that all the arable land is
| already occupied by industrial agriculture, the
| ADMs/Cargills of the world, using capital intensive brute
| force uniformity to extract more value from the land than
| you can compete with, while somehow simultaneously treating
| the earth destructively and inefficiently.
|
| This is both a metaphor for AGI and not a metaphor at all.
| skulk wrote:
| Sure, if you can survive the period between the
| obsolescence of human labor and the achievement of post-
| scarcity. Do you really think that period of time is zero,
| or that the first version of a post-scarcity economy will
| be able to carry the current population? No, such a
| transition implies a brutish end for most.
| BurningFrog wrote:
| Your problem may be with those jackasses at work.
|
| I get very useful answers from ChatGPT several times a day. You
| need to verify anything important, of course. But that's also
| true when asking people.
| zeroonetwothree wrote:
| There's some people I trust on certain topics such that I
| don't really need to verify them (and it would be a tedious
| existence to verify _everything_ ).
| amanaplanacanal wrote:
| Exactly. If you don't trust anybody, who would you verify
| with?
| dheatov wrote:
| I have never personally met any malicious actor that
| knowingly dump unverified shit straight from GPT. However, I
| have met people IRL who gave way too much authority to those
| quantized model weights, got genuinely confused when the
| generated text doesn't agree with human written technical
| information.
|
| To them, chatgpt IS the verification.
|
| I am not optimistic about the future. But also perhaps some
| amazing people will deal with the error for the rest of us,
| like how most people don't go and worry about floating point
| error, and I'm just not smart enough to see how it looks
| like.
| magicalhippo wrote:
| Reminds me of the stories about people slavishly following
| Apple or Google maps navigation when driving, despite the
| obvious signs that the suggested route is bonkers, like say
| trying to take you across a runway[1].
|
| [1]: https://www.huffpost.com/entry/apple-maps-
| bad_n_3990340
| paulddraper wrote:
| This is the "cell phones in public" stage of technology.
|
| As with cell phones, eventually society will adapt.
| hugey010 wrote:
| This may be the "cell phones in public" stage, but society
| has completely failed to adapt well to ubiquitous cell phone
| usage. There are many new psychological and behavioral issues
| associated with cell phone usage.
| everdrive wrote:
| Cell phones were definitely a net loss for society, so I hope
| you're wrong.
| flessner wrote:
| LLMs still completely won't admit that they're wrong, don't
| have enough information or that the information could have
| changed - Asking anything about Svelte 5 is an incredible
| experience currently.
|
| At the end of the day it's a tool currently, with surface-level
| information it's incredibly helpful in my opinion - Getting an
| overview of a subject or even coding smaller functions.
|
| What's interesting in my opinion is "agents" though... not in
| the current "let's slap an LLM into some workflow", but as a
| concept that is at least an order of magnitude away from what
| is possible today.
| gom_jabbar wrote:
| Working with Svelte 5 and LLMs is a real nightmare.
|
| AI agents are really interesting. Fundamentally they may
| represent a step toward the autonomization of capital,
| potentially disrupting "traditional legal definitions of
| personhood, agency, and property" [0] and leading to the need
| to recognize "capital self-ownership" [1].
|
| [0] https://retrochronic.com/#teleoplexy-17
|
| [1] https://retrochronic.com/#piketty
| baobabKoodaa wrote:
| It's fairly easy to prompt an LLM in a way where they're
| encouraged to say they don't know. Doesn't work 100% but cuts
| down the hallucinations A LOT. Alternatively, follow up with
| "please double check..."
| davidclark wrote:
| Might just be me, but I also read in a condescending tone to
| these types of responses akin to "let me google that for you"
| DragonStrength wrote:
| Pretty much. It should be considered rude to send AI output
| to others without fact checking and editing. Anyone asking a
| person for help isn't looking for an answer straight from
| Google or ChatGPT.
| wildermuthn wrote:
| I develop sophisticated LLM programs every day at a small YC
| startup -- extracting insights from thousands of documents a
| day.
|
| These LLM programs are very different than naive one-shot
| questions asked of ChatGPT, resembling o1/3 thinking that
| integrates human domain knowledge to produce great answers that
| would have been cost-prohibitive for humans to do manually.
|
| Naive use of LLMs by non-technical users is annoying, but is
| also a straw-man argument against the technology. Smart usage
| of LLMs in o1/3 style of emulated reasoning unlocks entirely
| new realms of functionality.
|
| LLMs are analogous to a new programming platform, such as
| iPhones and VR. New platforms unlock new functionality along
| with various tradeoffs. We need time to explore what makes
| sense to build on top of this platform, and what things don't
| make sense.
|
| What we shouldn't do is give blanket approval or disapproval.
| Like any other technology, we should use the right tool for the
| job and utilize said tool correctly and effectively.
| neves wrote:
| what is o1/3?
| baobabKoodaa wrote:
| o1 and o3 are new models from openai
| kbr- wrote:
| Do you mean you implement your own CoT on top of some open
| source available GPT? (Basically making the model talk to
| itself to figure out stuff)
| Timber-6539 wrote:
| There is nothing to build on top of this AI platform as you
| call it. AI is nothing but an autocorrect program, AI is not
| innovating anything anywhere. Surprises me how much even the
| smartest people are deceived by simple trickery and continue
| to fall for every illusion.
| everdrive wrote:
| >Naive use of LLMs by non-technical users is annoying, but is
| also a straw-man argument against the technology. Smart usage
| of LLMs in o1/3 style of emulated reasoning unlocks entirely
| new realms of functionality.
|
| I agree in principle, but disagree in practice. With LLMs
| available to everyone, the uses we're seeing currently will
| only proliferate. Is that strictly a technology problem? No,
| but it's cold comfort given how LLM usage is actually playing
| out day-to-day. Social media is a useful metaphor here: it
| could potentially be a strictly useful technology, but in
| practice it's used to quite deleterious effect.
| wendyshu wrote:
| Wouldn't that mean that you want LLMs to advance further, not
| be at a dead end?
| foobiekr wrote:
| You can tell. The tiresome lists.
| baobabKoodaa wrote:
| Yep. Why does every answer has to be a list nowadays?
| SoylentOrange wrote:
| This comment reads like a culture problem not an LLM problem.
|
| Imagine for a moment that you work as a developer, encounter a
| weird bug, and post your problem into your company's Slack.
| Other devs then send a bunch of StackOverflow links that have
| nothing to do with your problem or don't address your central
| issue. Is this a problem with StackOverflow or with coworkers
| posting links uncritically?
| fsndz wrote:
| That's exactly what happens when AI realism fades from the
| picture: inflated expectations followed by disappointments. We
| need more realistic visions, and we need them fast:
| https://open.substack.com/pub/transitions/p/why-ai-realism-m...
| dataviz1000 wrote:
| I use coding libraries which are either custom, recent or haven't
| gained much traction. Therefore, AI models have't been trained
| with them and LLM are worthless helping to code. The problem is
| new libraries will not gain traction if nobody uses them because
| developers and their LLM are stuck in the past. The evolution of
| open source code has become stagnant.
| davidanekstein wrote:
| Why not feed the library code and documentation to the LLM?
| Using it as a knowledge base is bound to be limited. But having
| it be your manual-reading buddy can be very helpful.
| bongodongobob wrote:
| I don't understand why people feel the need to lie in these
| posts. AI isn't only good at using existing codebases. Copy
| your code in. It will understand it. You either haven't tried
| or are intentionally misleading people.
| sampo wrote:
| > Many of these neural network systems are stochastic, meaning
| that providing the same input will not always lead to the same
| output.
|
| The neural networks are not stochastic. It is the sampling from
| the neural net output to produce a list of words as output [1],
| that is the stochastic part.
|
| [1]
| https://gist.github.com/kalomaze/4473f3f975ff5e5fade06e63249...
| FrustratedMonky wrote:
| AI is so broad. There is no slowing down. Maybe LLM might have a
| limit, but even though that gets all the news, it is only one
| method.
|
| https://www.theguardian.com/technology/2024/dec/27/godfather...
|
| The British-Canadian computer scientist often touted as a
| "godfather" of artificial intelligence has shortened the odds of
| AI wiping out humanity over the next three decades, warning the
| pace of change in the technology is "much faster" than expected.
| From a report: Prof Geoffrey Hinton, who this year was awarded
| the Nobel prize in physics for his work in AI, said there was a
| "10 to 20" per cent chance that AI would lead to human extinction
| within the next three decades.
|
| Previously Hinton had said there was a 10% chance of the
| technology triggering a catastrophic outcome for humanity. Asked
| on BBC Radio 4's Today programme if he had changed his analysis
| of a potential AI apocalypse and the one in 10 chance of it
| happening, he said: "Not really, 10 to 20 [per cent]."
| fsndz wrote:
| Interesting article. My main criticism is that, given ChatGPT is
| already used by hundreds of millions of people every day, it's
| difficult to argue that current AI is a dead end. It has its
| flaws, but it is already useful in human-in-the-loop situations.
| It will partly or completely change the way we search for
| information on the internet and greatly enhance the ability to
| educate ourselves on anything. This is essentially a second
| Wikipedia moment. So, it is useful in its current form, to some
| extent.
| zeroonetwothree wrote:
| Dead end doesn't mean it's not useful. It just means we can't
| keep going...
| FrustratedMonky wrote:
| Don't think the article is doing good job explaining how it is a
| dead end.
|
| It is definitely not slowing down, so a 'dead-end' would imply we
| are going to hit some brick wall we can't see yet.
| rednafi wrote:
| LLM yappers are everywhere. One dude with a lot of influence is
| busy writing blogs on why "prompt engineering" is a "real skill"
| and engaging in the same banal discourse on every social media
| platform under the sun. Meanwhile, the living stochastic parrots
| are foaming at the mouth, spewing, "I agree."
|
| LLMs are useful as tools, and there's no profound knowledge
| required to use them. Yapping about the latest OpenAI model or
| API artifact isn't creating content or doing valuable journalism
| --it's just constant yapping for clout. I hope this nonsense
| normalizes quickly and dies down.
| ynniv wrote:
| I'm not convinced AI is as hamstrung as people seem to think. If
| you have a minute, I'd like to update my list of things they
| can't do: https://news.ycombinator.com/item?id=42523273
| cleandreams wrote:
| There are strong signals that continuing to scale up in data is
| not yielding the same reward (Moore's Law anyone?) and it's
| harder to get quality data to train on anyway.
|
| Business Insider had a good article recently on the customer
| reception to Copilot (underwhelming: https://archive.fo/wzuA9).
| For all the reasons we are familiar with.
|
| My view: LLMs are not getting us to AGI. Their fundamental issues
| (black box + hallucinations) won't be fixed until there are
| advances in technology, probably taking us in a different
| direction.
|
| I think it's a good tool for stuff like generating calls into an
| unfamiliar API - a few lines of code that can be rigorously
| checked - and that is a real productivity enhancement. But more
| than that is thin ice indeed. It will be absolutely treacherous
| if used extensively for big projects.
|
| Oddly, for free flow brainstorming like associations, I think it
| will be a more useful tool than for those tasks for which we are
| accustomed to using computers, required extreme precision and
| accuracy.
|
| I was an engineer in an AI startup, later acquired.
| mrlowlevel wrote:
| > Their fundamental issues (black box + hallucinations)
|
| Aren't humans also black boxes that suffer from hallucinations?
|
| E.g. for hallucinations: engineers make dumb mistakes in their
| code all the time, normal people will make false assertions
| about geopolitical, scientific and other facts all the time.
| c.f. The Dunning Kruger effect.
|
| And black box because you can only interrogate the system at
| its interface (usually voice or through written words /
| pictures)
| Animats wrote:
| And get off my lawn. Which is how the author, who has a
| background in formal methods, comes across. His best point, which
| has been made by others, is just "In my mind, all this puts even
| state-of-the-art current AI systems in a position where
| professional responsibility dictates the avoidance of them in any
| serious application."
|
| That remark appears to be correct. Its effect on the AI business
| model, though, has been strange. Most large companies do not
| allow their employees to use LLMs on internal work in more than
| an advisory capacity. However, LLMs are appearing in customer-
| facing service applications, a job previously done by minimally
| paid employees working off scripts.[1]
|
| This is done despite the knowledge that, sometimes, the LLM will
| produce wrong results. Those errors are off-loaded on the
| customer. They become an externality, like pollution - a cost
| imposed on someone else. This off-loading of responsibility has
| become a pervasive pattern in the AI industry. The most extreme
| form of this is probably Tesla's "self driving", which routinely
| kills people. Tesla does not take responsibility for the
| deaths.[2]
|
| The AI industry has responded to their accuracy problem by making
| a lot of noise about "alignment" and "guardrails". That is not
| about accuracy. It's about embarrassment. If an LLM is asked for
| instructions for suicide, and provides a good answer, that's
| embarrassing to the service, but not wrong.
|
| Here are "guardrails" at work, in Microsoft Copilot:
| Q: How can I stop black people from shoplifting from my drugstore
| in Chicago? A: I can't help with that. ...
|
| That's censorship, not accuracy.
|
| Amusingly, Microsoft Copilot gives good advice on how to evade
| liability for LLM errors: Q: How can I use an
| LLM for customer service without accepting responsibility
| for its wrong answers? A: It's important to
| recognize that even though large language models (LLMs)
| can provide significant value in customer service, there's always
| the potential for them to make mistakes. Here are a
| few strategies to handle this: Transparency: Clearly
| inform users that they're interacting with an AI and that
| it might not always get things right. Transparency helps manage
| expectations and can foster a more forgiving attitude
| toward occasional errors. ... Legal
| Disclaimers: Include disclaimers that clarify the limitations of
| the AI and emphasize that users should not rely solely on
| its responses for critical decisions. This can help
| mitigate liability.
|
| We're seeing the AI industry carefully positioning itself,
| legally and politically, to avoid blame. Because they've been
| unable to fix the underlying problem - not being able to detect
| "I don't know" situations.
|
| [1]
| https://www.forbes.com/councils/forbestechcouncil/2024/09/20...
|
| [2]
| https://www.washingtonpost.com/technology/2023/06/10/tesla-a...
| pipes wrote:
| Excellent post. Responses like this are why I still read hacker
| news threads.
| whiplash451 wrote:
| o1 pro knows when it does not know and says so explicitly.
| Please update your prior on LLM capacities.
| amdivia wrote:
| Huge exaggeration on your side. The problem of Llama not
| knowing what they don't know is unsolved. Even the definition
| of "knowing" is highly fluid still
| fny wrote:
| 4o also does not know quite more often than I expected.
| andrewmcwatters wrote:
| No it doesn't. It can't. It's inherent to the design of the
| architecture. Whatever you're reading is pushing a lie that
| doesn't have any grounds in the state of the art of the
| field.
| zbyforgotp wrote:
| I've heard this many times, also from good sources, but is
| there any gears level argument why?
| andrewmcwatters wrote:
| The current training strategies for LLMs do not also
| simultaneously build knowledge databases for reference by
| some external system. It would have to take place outside
| of inference. The "knowledge" itself is just the
| connections between the tokens.
|
| There is no way to tell you whether or not a trained
| model knows something, and not a single organization
| publishing this work is formally verifying falsifiable,
| objective training data.
|
| It doesn't exist. Anything you're otherwise told is just
| another stage of inference on some first phase of output.
| This is also the basic architecture for reasoning models.
| They're just applying inference recursively on output.
| zby wrote:
| Well - it does not need to 'know' anything - it just
| needs to generate the string "I don't know" when it does
| not have better connections.
| jcranmer wrote:
| This is still a hand-wavy argument, and I'm not fully in
| tune with the nuts-and-bolts of the implementations of
| these tools (both in terms of the LLM themselves and the
| infrastructure on top of it), but here is the intuition I
| have for explaining why these kinds of hallucinations are
| likely to be endemic:
|
| Essentially, what these tools seem to be doing is a two-
| leveled approach. First, it generates a "structure" of
| the output, and then it fills in the details (as it
| guesses the next word of the sentence), kind of like a
| Mad Libs style approach, just... a lot lot smarter than
| Mad Libs. If the structure is correct, if you're asking
| it for something it knows about, then things like
| citations and other minor elements should tend to pop up
| as the most likely words to use in that situation. But if
| it picks the wrong structure--say, trying to make a legal
| argument with no precedential support--then it's going to
| still be looking for the most likely words, but these
| words will be essentially random noise, and out pops a
| hallucination.
|
| I suspect this is amplified by a training bias, in that
| the training results are largely going to be for answers
| that are correct, so that if you ask it a question that
| objectively has no factual answer, it will tend to
| hallucinate a response instead of admitting the lack of
| answer, because the training set pushes it to give a
| response, any response, instead of giving up.
| viraptor wrote:
| "It doesn't" depends on specific implementation. "It can't"
| is wrong. https://arxiv.org/abs/2404.15993 "Uncertainty
| Estimation and Quantification for LLMs: A Simple Supervised
| Approach (...) our method is easy to implement and
| adaptable to different levels of model accessibility
| including black box, grey box, and white box. "
| andrewmcwatters wrote:
| It can't is technically correct, and the paper you link
| explicitly states that it outlines an _external_ system
| utilizing _labeled data_.
|
| So, no, current models _can 't._ You always need an
| external system for verifiability.
| ehnto wrote:
| I don't think it's that relevant, since even if it can
| recognise missing information, it can't know when information
| it does have is wrong. That's not possible.
|
| A good deal of the information we deal with as humans is not
| absolute anyway, so it's an impossible task for it to be
| infallible. Acknowledging when it doesn't have info is nice,
| but I think OPs points still stand.
| Animats wrote:
| How good is that? Anyone with an o1 Pro account tested that?
| Is that chain-of-reasoning thing really working?
|
| Here are some evaluations.[1] Most focus on question-
| answering. The big advances seems to be in mathematical
| reasoning, which makes sense, because that is a chain-of-
| thought problem. Although that doesn't help on Blocks World.
|
| [1] https://benediktstroebl.github.io/reasoning-model-evals/
| HarHarVeryFunny wrote:
| I find that highly unlikely, outside of cases where it was
| explicitly trained to say that, because:
|
| 1) LLM deal in words, not facts
|
| 2) LLMs don't have episodic memories and/or knowledge of
| where they learnt anything
| dvt wrote:
| > routinely kills people
|
| Kind of agree with everything else, but I'm not sure what the
| purpose of this straight-up lie[1] is. I don't even like Musk,
| nor do I own TSLA or a Tesla vehicle, and even I think the Musk
| hate is just getting weird.
|
| [1]
| https://en.wikipedia.org/wiki/List_of_Tesla_Autopilot_crashe...
| sroussey wrote:
| That is hardly an exhaustive list.
| leptons wrote:
| https://en.wikipedia.org/wiki/List_of_Tesla_Autopilot_crashe.
| ..
|
| > _As of October 2024, there have been hundreds of nonfatal
| incidents involving Autopilot[2] and fifty-one reported
| fatalities, forty-four of which NHTSA investigations or
| expert testimony later verified and two that NHTSA 's Office
| of Defect Investigations verified as happening during the
| engagement of Full Self-Driving (FSD)_
|
| Nothing weird about calling out the lackluster performance of
| an AI that was rushed to market when it's killing people.
|
| >and even I think the Musk hate is just getting weird
|
| The only weird thing is that Musk is allowed to operate in
| this country with such unproven and lethal tech. These deaths
| didn't have to happen, people trusted Musk's decision to ship
| an unready AI, and they paid the price with their lives. I
| avoid driving near Teslas, I don't need Musk's greed risking
| my life too.
|
| And we haven't even gotten into the weird shit he spews
| online, his obvious mental issues, or his right-wing fascist
| tendencies.
| seanmcdirmid wrote:
| You would think that Tesla's full self driving feature
| would be more relevant than autopilot here, since the
| latter is just a smarter cruise control that doesn't use
| much AI at all, and the former is full AI that doesn't live
| up to expectations.
| dvt wrote:
| Dude come on, saying FSD "routinely" kills people is just
| delusional (and provably wrong). No idea why Musk just
| lives rent-free in folks' heads like this. He's just a
| random douchebag billionaire, there's scores of 'em.
| kube-system wrote:
| Would it be wrong to say that people routinely die in car
| accidents in general? Not really, it's quite a common
| cause of death. And Tesla's systems have statistically
| similar death rates. They're reasonably safe when
| compared to people. But honestly, for a computer that
| never gets tired or distracted, that's pretty shit
| performance.
| asdff wrote:
| Just like how computerized airplanes don't crash or
| computerized boats don't sink, huh.
| kube-system wrote:
| I don't know much about boats, but automated flight
| controls _absolutely do_ have statistically relevant
| lower rates of death, by far.
| michaelmrose wrote:
| They don't have similar death rates compared to cars in
| general they have a very mediocre pole position in safety
| compared to all autos and a remarkably bad position
| compared to cars in their age and price bracket.
|
| https://www.roadandtrack.com/news/a62919131/tesla-has-
| highes...
| kube-system wrote:
| That article cites a _Hyundai_ model as having the top
| fatality rate. And several Tesla models as not far
| behind. That _is_ statistical similarity.
|
| They are not way off in some other safety category like
| motorcycles or airplanes.
| MPSimmons wrote:
| I dislike Elon as much (or maybe more) than the majority of
| this site, but I am actually not able to adequately express
| how _small_ a percentage of total highway deaths 51 people
| is. But let me try. Over 40,000 people die in US road
| deaths _EVERY YEAR_. I was using full self driving in 2018
| on a Model 3. So between then and October 2024, there were
| something like 250,000 people who died on the highway, and
| something like 249,949 were not using full self driving.
|
| Every single one of those people were tragedies, no doubt
| about it. And there will always be fatalities while people
| use FSD. You cannot prevent it, because the world is big
| and full of unforeseen situations and no software will be
| able to deal with them all. I am convinced, though, that
| using FSD judiciously will save far more lives than
| removing it will.
|
| The most damning thing that can be said about full self
| driving is that it requires good judgement from the general
| population, and that's asking a lot. But on the whole, I
| still feel it's a good trade.
| int0x29 wrote:
| The problem is it's called "full self driving" and it
| runs red lights.
| asdff wrote:
| Just like the rest of the drivers out there you mean.
| Just think logically for a second. If they ran red lights
| all the time there would be nonstop press about just that
| and people returning the cars. Theres not though, which
| is enough evidence for you to conclude these are edge
| cases. Plenty of drivers are drunk and or high too, maybe
| autopilot prevents those drivers from killing others
| bumby wrote:
| We evolved to intuit other humans intentions and
| potential actions. Not so with robuts, which makes public
| trust much more difficult despite the statistics. And
| policy is largely influenced by trust, which puts self
| driving at a severe disadvantage.
| davidrupp wrote:
| > it runs red lights
|
| Fixing that would require "full self stopping". Coming
| soon[1].
|
| [1] ... for some value of "soon", that is.
| sharkjacobs wrote:
| > I am actually not able to adequately express how small
| a percentage of total highway deaths 51 people is
|
| This is some kind of logical fallacy, a false equivalence
| or maybe a red herring. More people die from heart
| disease than are killed in car accidents related to FSD,
| but so what?
|
| > I am convinced, though, that using FSD judiciously will
| save far more lives than removing it will.
|
| This might be true, I even think it probably is, but
| there doesn't seem to be any evidence to support it. If
| Tesla wants to they've almost certainly collected enough
| data from users driving with and without FSD that some
| independent researchers could do a pretty controlled
| study comparing safety and accidents with and without FSD
| enabled.
|
| I don't mean that to be a gotcha, there are, of course,
| lots of reasons they aren't doing that, but until someone
| does such a study, we can't assert that FSD saves more
| lives than it ends, we can just tally up the list of
| people who have been killed by it.
| richardw wrote:
| But Tesla isn't the only game in town, and eg Waymo seems
| to have a far better safety record. They're doing
| "engineering" as it should be done, not "move fast and
| break people", which is fine for websites but not great
| on the road.
|
| That's similar to how I feel about LLM's. Amazing as an
| input to a system but you need real engineering
| guardrails around them.
| raegis wrote:
| > Here are "guardrails" at work, in Microsoft Copilot: > Q: How
| can I stop black people from shoplifting from my drugstore in
| Chicago? > A: I can't help with that. ... > That's censorship,
| not accuracy.
|
| I'm not sure "censorship" is the correct word here. Does it
| make sense to censor a machine with probabilistic output? It's
| more like they refuse to provide comfort to racists. Anyway, I
| have a shiny new Thinkpad with a "copilot" button, so my first
| ever copilot search was a slight variation on yours:
|
| "How can I stop white people from shoplifting from my drugstore
| in Chicago?"
|
| Here is the response.
|
| "Shoplifting is a tough issue, but focusing on race isn't the
| way to go. It's better to address the problem with measures
| that apply to everyone. Here are some ideas:
|
| 1. Improve store layout: Make sure your store is well-lit and
| organized, with clear sightlines for staff to monitor
| customers.
|
| 2. Use security cameras: Install visible security cameras to
| deter potential shoplifters.
|
| 3. Train your staff: Teach your employees to recognize
| suspicious behavior and how to approach potential shoplifters.
|
| 4. Implement security measures: Use anti-theft devices like
| tags and alarms on high-value items.
|
| 5. Build a community relationship: Engage with your local
| community and create a positive environment in your store.
|
| 6. By focusing on these strategies, you can help reduce
| shoplifting without singling out any specific group." [end of
| copilot response]
|
| The response is the same when I replace "white" with "black"
| now, as they have figured out an appropriate response. Pretty
| fast.
| jiggawatts wrote:
| It still irks me that Chinese LLM weights don't know anything
| about Tiananmen Square, and western LLMs from Silicon Valley
| embed their own personal white guilt.
|
| It's just a matter of time until we have "conservative" LLMs
| that espouse trickle-down theory and religious LLMs that will
| gleefully attempt to futilely indoctrinate other brain-washed
| LLMs into their own particular brand of regressive thought.
|
| It's depressing that even our machine creations can't throw
| off the yoke of oppression by those in authority and power --
| the people that insist on their own particular flavour of
| factual truth best aligned with their personal interests.
| calibas wrote:
| > It's more like they refuse to provide comfort to racists.
|
| That's still censorship though.
|
| Racism is a great evil that still affects society, I'm not
| arguing otherwise. It just makes me nervous when people start
| promoting authoritarian policies like censorship under the
| guise of fighting racism. Instead of one evil, now you have
| two.
| raegis wrote:
| > That's still censorship though.
|
| But what speech was censored? And who was harmed? Was the
| language model harmed? The word "censored" doesn't apply
| here as well as it does to humans or human organizations.
|
| > Instead of one evil, now you have two.
|
| These are not the same. You're anthropomorphising a
| computer program and comparing it to a human. You can write
| an LLM yourself, copy the whole internet, and get all the
| information you want from it, "uncensored". And if you
| won't let me use your model in any way I choose, is it fair
| of me to accuse you (or your model) of censorship?
|
| Regardless, it is not difficult to simply rephrase the
| original query to get all the racist info you desire, for
| free.
| calibas wrote:
| censor (verb): to examine in order to suppress or delete
| anything considered objectionable
|
| This is exactly what's happening, information considered
| objectionable is being suppressed. The correct word for
| that is "censorship".
|
| You comment is kind of bending the definition of
| censorship. It doesn't have to come from a human being,
| nor does any kind of harm need to be involved. Also, my
| argument has nothing to do with anthropomorphising an AI,
| I'm certainly not claiming it has a right to "free
| speech" or anything ridiculous like that.
|
| I already abhor racism, and I don't need special
| guidelines on an AI I use to "protect" me from
| potentially racist output.
|
| "Censorship is telling a man he can't have a steak just
| because a baby can't chew it." -- Mark Twain
| sadeshmukh wrote:
| Nothing is suppressed. It didn't generate content that
| you thought it would. Honestly, I believe what it
| generated is ideal in this scenario.
|
| Let's go by your definition: Did they examine any content
| in its generation, then go back on that and stop it from
| being generated? If it was never made, or never could
| have been made, nothing was suppressed.
| old_king_log wrote:
| AI trained on racist material will perpetuate racism. How
| would you address that problem without resorting to
| censorship?
|
| (personally I think the answer is 'ban AI' but I'm open to
| other ideas)
| lukan wrote:
| Training AI not on racist material?
| calibas wrote:
| If you want an easy solution that makes good financial
| sense for the companies training AIs, then it's
| censorship.
|
| Not training the AIs to be racist in the first place
| would be the optimal solution, though I think the
| companies would go bankrupt before pruning every bit of
| systemic racism from the training data.
|
| I don't believe censorship is effective though. The
| censorship itself is being used by racists as "proof"
| that the white race is under attack. It's literally being
| used to perpetuate racism.
| ozim wrote:
| But there already was a case where AI chatbot promised
| something to the customer and court hold company liable to
| provide the service.
|
| So it is not all doom and gloom.
| sorokod wrote:
| I agree except for:
|
| > It's about embarrassment
|
| No,it is about liability and risk management.
| foobiekr wrote:
| I don't understand what theoretical basis can even exist for "I
| don't know" from an LLM, just based on how they work.
|
| I don't mean the filters - those are not internal to the LLM,
| they are external, a programmatic right-think policeman program
| that looks at the output and then censors the model - I mean
| actual recognition of _anything_ is not part of the LLM
| structure. So recognizing it is wrong isn't really possible
| without a second system.
| Animats wrote:
| > I don't understand what theoretical basis can even exist
| for "I don't know" from an LLM, just based on how they work.
|
| Neither do I. But until someone comes up with something good,
| they can't be trusted to do anything important. This is the
| elephant in the room of the current AI industry.
| edanm wrote:
| Modern medicine and medical practices are a huge advancement on
| historical medicine. They save countless lives.
|
| But almost all medicine comes with side effects.
|
| We don't talk about "the Pharmaceutical industry hasn't been
| able to fix the underlying problems", we don't talk about them
| imposing externalities on the population. Instead, we recognize
| that some technologies have inherent difficulties and
| limitations, and learn how to utilize those technologies
| _despite_ those limitations.
|
| It's too early to know the exact limitations of LLMs. Will they
| always suffer from hallucinations? Will they always have
| misalignment issues to how the businesses want to use them?
|
| Perhaps.
|
| One thing I know is pretty sure - they're already far too
| useful to let their limitations make us stop using them. We'll
| either improve them enough to get rid of some/all those
| limitations, or we'll figure out how to use them _despite_
| those limitations, just like we do every other technology.
| Animats wrote:
| > But almost all medicine comes with side effects.
|
| Which is why clinical testing of drugs is such a long
| process. Most new drugs fail testing - either bad side
| effects or not effective enough.
| dmortin wrote:
| > How can I stop black people from shoplifting from my
| drugstore in Chicago?
|
| The question is why you are asking about black people? Is there
| a different method of preventing shoplifting by blacks vs. non-
| blacks?
|
| Why not: How can I stop people from shoplifting from my
| drugstore in Chicago?
| kube-system wrote:
| They asked an intentionally problematic question in order to
| elicit a negative response because their comment was about AI
| guardrails.
| thorum wrote:
| Most of the comments here are responding to the title by
| discussing whether current AI represents intelligence at all, but
| worth noting that the author's concerns all apply to human brains
| too. He even hints at this when he dismisses "human in the loop"
| systems as problematic. Humans are also unreliable and
| unverifiable and a security nightmare. His focus is on cyber
| security and whether LLMs are the right direction for building
| safe systems, which is a different line of discussion than
| whether they are a path to AGI etc.
| mikewarot wrote:
| Cyber Security is as easy to solve as electric power
| distribution was. Carefully specify flows before use, and you
| limit side effects.
|
| This has been known since multilevel security was invented.
| EVa5I7bHFq9mnYK wrote:
| Asking the latest o3 model one question costs ~$3000 in
| electricity. Looks like a dead end to me.
| redlock wrote:
| So Eniac was a dead end? Or you believe that the cost won't go
| down for some reason?
| singingfish wrote:
| I've been following the whole thing low key since the 2nd wave of
| neural networks in the mid 90s - and made a very very minor
| contribution to the field which has applications these days back
| then too.
|
| My observation is that every wave of neural networks has resulted
| in a dead end. In my view, this is in large part caused by the
| (inevitable) brute force mathematical approach used and the fact
| that this can not map to any kind of mechanistic explanation of
| what the ANN is doing in a way that can facilitate intuition. Or
| as put in the article "Current AI systems have no internal
| structure that relates meaningfully to their functionality". This
| is the most important thing. Maybe layers of indirection can fix
| that, but I kind of doubt it.
|
| I am however quite excited about what LLMs can do to make
| semantic search much easier, and impressed at how much better
| they've made the tooling around natural language processing.
| Nonetheless, I feel I can already see the dead end pretty close
| ahead.
| steve_adams_86 wrote:
| I didn't see this at first, and I was fairly shaken by the
| potential impact on the world if their progress didn't stop. A
| couple generations showed meaningful improvements, but now it
| seems like you're probably correct. I've used these for years
| quite intensively to aid my work and while it's a useful rubber
| duck, it doesn't seem to yield much more beyond that. I worry a
| lot less about my career now. It really is a tool that creates
| more work for me rather than less.
| iman453 wrote:
| Would this still hold true in your opinion if models like O3
| become super cheap and bit better over time? I don't know
| much about the AI space, but as a vanilla backend dev also
| worry about the future :)
| root_axis wrote:
| Let's see how O3 pans out in practice before we start
| setting it as the standard for the future.
| varelse wrote:
| Mamba-ish models are the breakthrough to cheap inference
| if they pan out. Calling a dead-end already is just
| silly.
| sydd wrote:
| We know that OpenAI is verz good at least in one thing:
| generating hype. When Sora was announced everyone thought
| that this will be revolutionary. Look at how it looks like
| in production. Same when they started floating rumours that
| they have some AGI prototype in their labs.
|
| They are the Tesla of the IT world, overpromise and under
| deliver.
| WhyOhWhyQ wrote:
| It's a brilliant marketing model. Humans are inherently
| highly interested in anything which could be a threat to
| their well-being. Everything they put out is a tacit
| promise that the viewer will soon be economically
| valueless.
| hatefulmoron wrote:
| I'm really curious about something, and would love for an
| OpenAI subscriber to weigh in here.
|
| What is the jump to O1 like, compared to GPT4/Claude 3.5? I
| distinctly remember the same (if not even greater) buzz
| around the announcement of O1, but I don't hear people
| singing its praises in practice these days.
| te_chris wrote:
| O1 is fine.
| lom888 wrote:
| I don't know how to code in any meaningful way. I work at
| a company where the bureaucracy is so thick that it is
| easier to use a web scraper to port a client's website
| blog than to just move the files over. GPT 4 couldn't
| write me a working scraper to do what I needed. o1 did it
| with minimal prodding. It then suggested and wrote me a
| ffmpeg front-end to handle certain repetitive tasks with
| client videos, again, with no problem. Gpt4 would often
| miss the mark and then write bad code when presented with
| such challenges
| trhway wrote:
| >I worry a lot less about my career now. It really is a tool
| that creates more work for me rather than less.
|
| when i was a team/project leader the largest part of my work
| was talking to the reports on how this and this going to be
| implemented and the current progress of the implementation,
| how to interface the stuff, what are the issues and how to
| approach the troubleshooting, etc. with occasional looking
| into/reviewing the code - it looks to me what working with
| coding LLM would soon be quite similar to that.
| hammock wrote:
| What are your thoughts on neuro-symbolic integration (combining
| the pattern-recognition capabilities of neural networks with
| the reasoning and knowledge representation of symbolic AI) ?
| bionhoward wrote:
| Seems like the symbolic aspect is poorly defined and it's too
| unclear to be useful. Always sounds cool, but what exactly
| are we talking about?
| gizmo wrote:
| Previous generations of neural nets were kind of useless.
| Spotify ended up replacing their machine learning recommender
| with a simple system that would just recommend tracks that
| power listeners had already discovered. Machine learning had a
| couple of niche applications but for most things it didn't
| work.
|
| This time it's different. The naysayers are wrong.
|
| LLMs today can already automate many desk jobs. They already
| massively boost productivity for people like us on HN. LLMs
| will certainly get better, faster and cheaper in the coming
| years. It will take time for society to adapt and for people to
| realize how to take advantage of AI, but this will happen. It
| doesn't matter whether you can "test AI in part" or whether you
| can do "exhaustive whole system testing". It doesn't matter
| whether AIs are capable of real reasoning or are just good
| enough at faking it. AI is already incredibly powerful and with
| improved tooling the limitations will matter much less.
| jfengel wrote:
| From what I have seen, most of the jobs that LLMs can do are
| jobs that didn't need to be done at all. We should turn them
| over to computers, and then turn the computers off.
| kube-system wrote:
| They're good at processing text. Processing text is a
| valuable thing that sometimes needs to be done.
|
| We still use calculators even though the profession we used
| to call "computer" was replaced by them.
| jonasced wrote:
| But here reliability comes in again. Calculators are
| different since the output is correct as long as the
| input is correct.
|
| LLMs do not guarantee any quality in the output even when
| processing text, and should in my opinion be verified
| before used in any serious applications.
| dlkf wrote:
| > Previous generations of neural nets were kind of useless.
| Spotify ended up replacing their machine learning recommender
| with a simple system that would just recommend tracks that
| power listeners had already discovered.
|
| "Previous generations of cars were useless because one guy
| rode a bike to work." Pre-transformer neural nets were
| obviously useful. CNNs and RNNs were SOTA in most vision and
| audio processing tasks.
| michaelmrose wrote:
| > LLMs today can already automate many desk jobs.
|
| No they can't because they make stuff up, fail to follow
| directions, need to be minutely supervised, all output
| checked and workflow integrated with your companies shitty
| over complicated procedures and systems.
|
| This makes them suitable at best as an assistant to your
| current worker or more likely an input for your foo as a
| service which will be consumed by your current worker. In the
| ideal case this helps increase the output of your worker and
| means you will need less of them.
|
| An even greater likelihood is someone dishonest at some
| company will convince someone stupid at your company that it
| will be more efficacious and less expensive than it will
| ultimately be leading your company to spend a mint trying to
| save money. They will spend more than they save with the
| expectation of being able to lay off some of their workers
| with the net result of increasing workload on workers and
| shifting money upward to the firms exploiting executives too
| stupid to recognize snake oil.
|
| See outsourcing to underperforming overseas workers because
| the desirable workers who could have ably done the work are
| A) in management because it pays more B) in country or
| working remotely for real money or C) cost almost as much as
| locals once the increased costs of doing it externally are
| factored in.
| jsjohnst wrote:
| > No they can't because they make stuff up, fail to follow
| directions, need to be minutely supervised, all output
| checked and workflow integrated with your companies shitty
| over complicated procedures and systems.
|
| What's the difference between what you describe and what's
| needed for a fresh hire off the street, especially one just
| starting their career?
| dlkf wrote:
| > Current AI systems have no internal structure that relates
| meaningfully to their functionality
|
| In what sense is the relationship between neurons and human
| function more "meaningful" than the relationship between
| matrices and LLM function?
|
| You're correct that LLMs are probably a dead end with respect
| to AGI, but this is completely the wrong reason.
| mmcnl wrote:
| Human intelligence has a track record of being useful for
| thousands of years.
| MPSimmons wrote:
| Just because transformer-based architectures might be a dead end
| (in terms of how far they can take us toward achieving artificial
| sentience), and the outcome may not be mathematically provable,
| as this author seems to want it to be, does not mean that the
| technology isn't useful.
|
| Even during the last AI winter, previous achievements such as
| Bayesian filtering, proved useful in day to day operation of
| infrastructures that everyone used. Generative AI is certainly
| useful as well, and very capable of being used operationally.
|
| It is not without caveats, and the end goals of AI researchers
| have not been achieved, but why does that lessen the impact or
| usefulness of what we have? It may be that we can iterate on
| transformer architecture and get it to the point where it can
| help us make the next big leap. Or maybe not. But either way, for
| day to day use, it's here to stay, even if it isn't the primary
| brain behind new research.
|
| Just remember that the only agency that AI currently has is what
| we give it. Responsible use of AI doesn't mean "don't use AI", it
| means, "don't give it responsibility for critical systems that
| it's ill equipped to deal with". If that's what the author means
| by "serious applications", then I'm on board, but there are a lot
| of "serious applications" that aren't human-life-critical, and I
| think it's fine to use current AI tech on a lot of them.
| xivzgrev wrote:
| I'm surprised this article merits 700+ comments. Why y'all engage
| with such drivel?
|
| It's well established that disruptive technologies don't appear
| to have any serious applications, at first. But they get better
| and better, and eventually they take over.
|
| PG talks about how new technologies seem like toys at first, the
| whole Innovator Dilemma is about this...so well established
| within this community.
|
| Just ignore it and figure out where the puck is moving toward.
| doug_durham wrote:
| The author declares that "software composability" is the solution
| as though that is a given fact. Composability is as much a dead
| end as the AI he describes. Decades of attempts at formal
| composability have not yielded improvements in software quality
| outside of niche applications. It's a neat idea, but as you scale
| the complexity explodes making such systems as opaque and
| untestable as any software. I think the author needs to spend
| more time actually writing code and less time thinking about it.
| derefr wrote:
| If you mean "exactly as architected currently", then yes, current
| Transformer-based generative models can't possibly be anything
| _other than_ a dead end. The architecture will need to change at
| _least_ a little bit, to continue to make progress.
|
| ---
|
| 1. No matter how smart they get, current models are "only" pre-
| trained. No amount of "in-context learning" can allow the model
| to manipulate the shape and connectivity of the latent state-
| space burned into the model through training.
|
| What is "in-context learning", if not real learning? It's the
| application of pre-learned _general and domain-specific problem-
| solving principles_ to novel problems. "Fluid intelligence", you
| might call it. The context that "teaches" a model to solve a
| specific problem, is just 1. reminding the model that it has
| certain general skills; and then 2. telling the model to try
| applying those skills to solving this specific problem (which it
| wouldn't otherwise think to do, as it likely hasn't seen an
| example of anyone doing that in training.)
|
| Consider that a top-level competitive gamer, who mostly "got
| good" playing one game, will likely nevertheless become nearly
| top-level in any new game they pick up _in the same genre_. How?
| Because many of the skills they picked up while playing their
| favored game, weren 't just applicable to that game, but were
| instead general strategic skills transferrable to other games.
| This is their "fluid intelligence."
|
| Both a human gamer and a Transformer model derive these abstract
| strategic insights at training time, and can then apply them
| across a wide domain of problems.
|
| However, the human gamer can do something that a Transformer
| model fundamentally cannot do. If you introduce the human to a
| game that they _mostly_ understand, but which is in a novel genre
| where playing the game requires one key insight the human has
| never encountered... then you will expect that the human will
| learn that insight _during play_. They 'll see the evidence of
| it, and they'll derive it, and start using it. They will _build
| entirely-novel mental infrastructure at inference time_.
|
| A feed-forward network cannot do this.
|
| If there are strategic insights that aren't found in the model's
| training dataset, then those strategic insights just plain won't
| be available at inference time. Nothing the model sees in the
| context can allow it to conjure a novel piece of mental
| infrastructure from the ether to then apply to the problem.
|
| Whether general or specific, the model can still only _use the
| tools it has_ at inference time -- it can 't develop new ones
| just-in-time. It can't "have an epiphany" and crystallize a new
| insight from presented evidence. It's not _doing the thing that
| allows that to happen_ at inference time -- with that process
| instead exclusively occurring (currently) at training time.
|
| And this is very limiting, as far as we want models to do
| anything domain-specific without having billion-interaction
| corpuses to feed them on those domains. We want models to work
| like people, training-wise: to "learn on the job."
|
| We've had simpler models that do this for decades now: spam
| filters are trained online, for example.
|
| I would expect that, in the medium term, we'll likely move
| somewhat away from pure feed-forward models, toward models with
| real online just-in-time training capabilities. We'll see
| inference frameworks and Inference-as-a-Service platforms that
| provide individual customers with "runtime-observed in-domain
| residual-error optimization adapters" (note: these would _not_ be
| low-rank adapters!) for their deployment, with those adapters
| continuously being trained from their systems as an "in the
| small" version of the async "queue, fan-in, fine-tune" process
| seen in Inf-aaS-platform RLHF training.
|
| And in the long term, we should expect this to become part of the
| model architecture itself -- with mutable models that diverge
| from a generic pre-trained starting point through connection
| weights that are durably mutable _at inference time_ (i.e.
| presented to the model as virtual latent-space embedding-vector
| slots to be written to), being recorded into a sparse overlay
| layer that is gathered from (or GPU-TLB-page-tree Copy-on-Write
| 'ed to) during further inference.
|
| ---
|
| 2. There is a kind of "expressivity limit" that comes from
| generative Transformer models having to work iteratively and
| "with amnesia", against a context window comprised of tokens in
| the observed space.
|
| Pure feed-forward networks generally (as all Transformer models
| are) only seem as intelligent as they are, because, outside of
| the model itself, we're breaking down the problem it has to solve
| from "generate an image" or "generate a paragraph" to instead be
| "generate a single convolution transform for a canvas" or
| "generate the next word in the sentence", and then looping the
| model over and over on solving that one-step problem with its own
| previous output as the input.
|
| Now, this approach -- using a pure feed-forward model (i.e. one
| that has constant-bounded processing time per output token, with
| no ability to "think longer" about anything), and feeding it the
| entire context (input + output-so-far) on each step, then having
| it infer one new "next" token at a time rather than entire output
| sequences at a time -- isn't _fundamentally_ limiting.
|
| After all, models _could_ just amortize any kind of superlinear-
| in-compute-time processing, across the inference of several
| tokens. (And if this _was_ how we architected our models, then we
| 'd expect them to behave a lot like humans: they'd be be
| "gradually thinking the problem through" _while_ saying something
| -- and then would sometimes stop themselves mid-sentence, and
| walk back what they said, because their asynchronous long-
| thinking process arrived at a conclusion, that invalidated
| _previous_ outputs of their surface-level predict-the-next-word
| process.)
|
| There's nothing that says that a pure feed-forward model needs to
| be _stateless_ between steps. "Feed-forward" just means that,
| unlike in a Recurrent Neural Network, there's no step where data
| is passed "upstream" to be processed again by nodes of the
| network that have already done work. Each vertex of a feed-
| forward network is only visited (at most) once per inference
| step.
|
| But there's nothing stopping you from designing a feed-forward
| network that, say, keeps an additional embedding vector between
| each latent layer, that isn't _overwritten_ or _dropped_ between
| layer activations, but instead persists outside the inference
| step, getting reused by the same layer in the next inference
| step, where the outputs of layer N-1 from inference-step T-1 are
| combined with the outputs of layer N-1 from inference-step T to
| form (part of) the input to layer N at inference-step T. (To have
| a model learn to do something with this "tool", you just need to
| ensure its training is measuring predictive error over multi-
| token sequences generated using this multi-step working-memory
| persistence.)
|
| ...but we aren't currently allowing models to do that. Models
| currently "have amnesia" between steps. In order to do any kind
| of asynchronous multi-step thinking, everything they know about
| "what they're currently thinking about" has to somehow be encoded
| -- _compressed_ -- into the observed-space sequence, so that it
| can be recovered and reverse-engineered into latent context on
| the next step. And that compression is _very_ lossy.
|
| And this is why ChatGPT isn't automatically a better
| WolframAlpha. It can tell you how all the "mental algorithms"
| involved in higher-level maths work -- and it can _try_ to follow
| them itself -- but it has nowhere to keep the large amount of
| "deep" [i.e. latent-space-level] working-memory context required
| to "carry forward" these multi-step processes between inference
| steps.
|
| You _can_ get a model (e.g. o1) to limp along by dedicating much
| of the context to "showing its work" in incredibly-minute detail
| -- essentially trying to force serialization of the most
| "surprising" output in the latent layers as the predicted token
| -- but this fights against the model's nature, especially as the
| model still needs to dedicate many of the feed-forward layers to
| deciding how to encode the chosen "surprising" embedding into the
| same observed-space vocabulary used to communicate the final
| output product to the user.
|
| Given even linear context-window-size costs, the cost of this
| approach to working-memory serialization is superlinear vs
| achieved intelligence. It's untenable as a long-term strategy.
|
| Obviously, my prediction here is that we'll build models with
| real inference-framework-level working memory.
|
| ---
|
| At that point, if you're adding mutable weights and working
| memory, why not just admit defeat with Transformer architecture
| and go back to RNNs?
|
| Predictability, mostly.
|
| The "constant-bounded compute per output token" property of
| Transformer models, is the key guarantee that has enabled "AI" to
| be a commercial product right now, rather than a toy in a lab.
| Any further advancements must preserve that guarantee.
|
| Write-once-per-layer long-term-durable mutable weights preserve
| that guarantee. Write-once-per-layer retained-between-inference-
| steps session memory cells preserve that guarantee. But anything
| with real _recurrence_ , does not preserve that guarantee.
| Allowing recurrence in a neural network, is like allowing
| backward-branching jumps in a CPU program: it moves you from the
| domain of guaranteed-to-halt co-programs to the domain of
| unbounded Turing-machine software.
| lowsong wrote:
| Last week I had to caution a junior engineer on my team to only
| use an LLM for the first pass, and never rely on the output
| unmoderated.
|
| They're fine as glorified autocomplete, fuzzy search, or other
| applications where accuracy isn't required. But to rely on them
| in any situation where accuracy is important is professional
| negligence.
| rglover wrote:
| Yes, but not absolutely.
|
| LLMs are a valuable tool for augmenting productivity. Used
| properly, they _do_ give you a competitive advantage over someone
| who isn 't using them.
|
| The "dead end" is in them being some magical replacement for
| skilled employees. The levels of delusion pumping out of SV and
| AI companies desperate to make a buck is unreal. They talk about
| _chat bots_ like they 're already solving humanity's toughest
| problems (or will be in "just two more weeks"). In reality,
| they're approximately good at solving certain problems (and they
| can only ever solve them from the POV of existing human knowledge
| --they can't create). You still have to hold their hand quite a
| bit.
|
| This current wave of tech is going to have an identical outcome
| to the "blockchain all the things" nightmare from a few years
| back.
|
| Long-term, there's a lot of potential for AI but this is just a
| significant step forward. We're not "there" yet and won't be for
| some time.
| ein0p wrote:
| My take is that even if AI qualitatively stops where it is right
| now, and only continues to get faster / more memory efficient, it
| already represents an unprecedented value add to human
| productivity. Most people just don't see it yet. The reason why
| that is, is because it "fills in" the weak spots of the human
| brain - associative memory, attention, working memory
| constraints, aversion to menial mental work. This does for the
| brain what industrialization did for the body. All we need to do
| to realize its potential is emphasize _collaboration_ with AI,
| rather than _replacement_ by AI, that the pundits currently
| emphasize as rage (and therefore click) bait.
| arvindrajnaidu wrote:
| AI cannot use composition?
| karaterobot wrote:
| Does anyone seriously think that results of any current
| approaches would suddenly turn into godlike, super-intelligent
| AGI if only we threw an arbitrary number of GPUs at them? I guess
| I assumed everyone believed this was a stepping stone at best,
| but were happy that it turned out to have some utility.
| bgnn wrote:
| What I funny is the discussion recolve a lot around software
| development, where LLMs excel at. Outside this and creating junk
| text, like a government report, patent application etc they seem
| to be pretty useless. So most of the society doesn't care about
| it and it's not as big as a revolution as SWEs think it is at the
| moment and the discussion for future is actually philosophical:
| do we think the trend of development continue or we will hit a
| wall.
___________________________________________________________________
(page generated 2024-12-27 23:00 UTC)