hngopher.com

       [HN Gopher] Things we learned about LLMs in 2024
       ___________________________________________________________________
        
       Things we learned about LLMs in 2024
        
       Author : simonw
       Score  : 352 points
       Date   : 2024-12-31 18:11 UTC (4 hours ago)
        
 (HTM) web link (simonwillison.net)
 (TXT) w3m dump (simonwillison.net)
        
       | agentultra wrote:
       | Don't forget that 2024 was also a record year for new methane
       | power plant projects. Some 200 new projects in the US alone and
       | I'd wager most of them are funded directly by big tech for AI
       | data centres.
       | 
       | https://www.bnnbloomberg.ca/investing/2024/09/16/ai-boom-is-...
       | 
       | This is definitely extending the runway of O&G at a crisis point
       | in the climate disaster when we're supposed to be reducing and
       | shutting down these power plants.
       | 
       |  _Update_ : clarified the 200 number is in the US. There are far
       | more world wide.
        
         | api wrote:
         | The only thing that will stop this is for battery storage to
         | get cheap and available enough that it can cover for
         | renewables. If we are still building gas turbines it means that
         | hasn't happened yet.
         | 
         | AI is a red herring. If it wasn't that it would be EV power
         | demand. If it wasn't that it would be reshoring of
         | manufacturing. If it wasn't that it would be population growth
         | from immigration. If it wasn't that it would be replacing old
         | coal power plants reaching EOL.
         | 
         | Replacing coal with gas is an improvement by the way. It's
         | around half the CO2 per kWh, sometimes less if you factor in
         | that gas turbines are often more efficient than aging old coal
         | plants.
        
           | agentultra wrote:
           | Methane has a shorter half-life than CO2 but is a far worse
           | green house gas; retaining far more heat.
           | 
           | And delivering methane leaks like a sieve into the atmosphere
           | from all parts of the process.
           | 
           | Sure it's probably "better than coal," but not by much. It's
           | a bit like comparing what's worse: getting burned by fire or
           | being drowned in acid.
        
           | Nition wrote:
           | Pumped hydro is an excellent form of storage if you have the
           | terrain for it. A whole order of magnitude cheaper than
           | battery storage at the moment.
        
         | ToucanLoucan wrote:
         | It would be really cool if big tech could find a new
         | hyperscaler model that didn't _also_ require offsetting the
         | goals of green energy projects worldwide. Between LLM and
         | crypto you 'd swear they're trying to find the most energy-
         | wasteful tech possible.
        
           | zachrip wrote:
           | It seems odd to put crypto and LLMs in the same boat in this
           | regard - I might be wrong but are there any crypto projects
           | that actually provide value? I'm sure there are ones that do
           | folding or something but among the big ones?
        
             | rileymat2 wrote:
             | Value is a hard term, this link will seem snarky, but:
             | https://www.axios.com/2024/12/25/russia-bitcoin-evade-
             | sancti...
             | 
             | So in a way, it is providing value to someone, whether we
             | like it or not.
             | 
             | Or Drug Cartels. https://www.context.news/digital-
             | rights/how-crypto-helps-lat...
             | 
             | But this is the promise of uncontrollable decentralization
             | providing value, for good or bad?
        
           | ben_w wrote:
           | Cryptocurrency, at least PoW, the point is indeed to be the
           | most wasteful -- a literal Dyson swarm powered Bitcoin would
           | provide _exactly the same utility_ as the BTC network already
           | had in 2010.
           | 
           | LLMs (and the image, sound, and movie generating models) are
           | more _coincidentally_ power-hogs -- people are at least
           | trying to make them better at fixed compute, and lower
           | compute at fixed quality.
        
             | ToucanLoucan wrote:
             | I mean, I appreciate that distinction and don't disagree.
             | _And,_ if this is going to continue being a trend, I think
             | we need more stringent restrictions on what sorts of
             | resources are permitted to be consumed in the power plants
             | that are constructed to meet the needs of hyperscaler data
             | centers.
             | 
             | Because whether we're using tons of compute to provide
             | value or not doesn't change that _we are using tons of
             | compute_ and tons of compute requires tons of energy, both
             | for the chips themselves, and the extensive infrastructure
             | that has to built around them to let them work. And not
             | just electricity: refrigerants, many of which are
             | environmentally questionable themselves, are a big part;
             | hell, just water. Clean, usable water.
             | 
             | If we truly need these data centers, then fine. Then they
             | should be powered by renewable energy, or if they
             | absolutely cannot be, then the costs their nonrenewable
             | energy sources inflict on the biosphere should be priced
             | into their construction and use, and in turn, priced into
             | the tech that is apparently so critical for them to have.
             | 
             | This is like, a basic calculus that every grown person
             | makes dozens of times a day: do I need this? And they don't
             | get to distribute the cost of that need, however prescient
             | it may be, on their wider community because they can't
             | afford it otherwise. I don't see why Microsoft should be
             | able to either. If this is truly the tech of the future as
             | it is constantly propped up to be, cool. Then charge a
             | price for it that reflects what it costs to use.
        
         | comte7092 wrote:
         | Energy generation methods aren't fungible.
         | 
         | Methane is favored in many cases because they can be quickly
         | ramped up and down to handle momentary peaks in demand or
         | spotty supply generated from renewables.
         | 
         | Without knowing more details about those projects it is
         | difficult to make the claim that these plants have anything to
         | do with increased demand due to LLMs, though if anything,
         | they'd just add to base load demands and lead to slower
         | decommissioning of old coal plants like we've seen with bitcoin
         | mines.
        
           | throwup238 wrote:
           | Methane is also worth burning to lessen the GHG impact since
           | we produce so much of it as a byproduct of both resource
           | extraction and waste disposal anyway.
        
         | uludag wrote:
         | But according to the author, apparently bringing this up isn't
         | helpful criticism.
         | 
         | I'm curious what peoples thoughts are of what the future of
         | LLMs would be like if we severely overshoot our carbon goals.
         | How bad would thinks have to get for people to stop caring
         | about this technology?
        
           | simonw wrote:
           | It's helpful criticism as _part_ of the conversation. What
           | frustrates me is when people go  "LLMs are burning the
           | planet!" and leave it at that.
        
             | agentultra wrote:
             | It is a rather contrasting opinion that the trade-offs to
             | have AI aren't worth the value they bring.
             | 
             | The growth in this technology isn't outpacing car pollution
             | and O&G extraction... yet, but the growth rate has been
             | enough in recent years to put it on the radar of industries
             | to watch out for.
             | 
             | I hope the compute efficiencies are rapid and more than
             | commensurate with the rate of growth so that we can make
             | progress on our climate targets.
             | 
             | However it seems unlikely to me.
             | 
             | It's been a year of progress for the tech... but also a lot
             | of setbacks for the rest of the world. I'm fairly certain
             | we don't need AGI to tell us how to cope with the climate
             | crisis; we already have the answer for that.
             | 
             | Although if the industry does continue to grow and the
             | efficiency gains aren't enough... will society/investors be
             | willing to scale back growth in order to meet climate
             | targets (assuming that AI becomes a large enough segment of
             | global emissions to warrant reductions)?
             | 
             | Interesting times for the field.
        
       | jmclnx wrote:
       | Interesting, the article is not quite what I expected.
        
       | fosterfriends wrote:
       | My fav part of the writeup at the end:
       | 
       | """
       | 
       | LLMs need better criticism # A lot of people absolutely hate this
       | stuff. In some of the spaces I hang out (Mastodon, Bluesky,
       | Lobste.rs, even Hacker News on occasion) even suggesting that
       | "LLMs are useful" can be enough to kick off a huge fight.
       | 
       | I like people who are skeptical of this stuff. The hype has been
       | deafening for more than two years now, and there are enormous
       | quantities of snake oil and misinformation out there. A lot of
       | very bad decisions are being made based on that hype. Being
       | critical is a virtue.
       | 
       | If we want people with decision-making authority to make good
       | decisions about how to apply these tools we first need to
       | acknowledge that there ARE good applications, and then help
       | explain how to put those into practice while avoiding the many
       | unintiutive traps.
       | 
       | """
       | 
       | LLMs are here to stay, and there is a need for more thoughtful
       | critique rather than just "LLMs are all slop, I'll never use it"
       | comments.
        
         | vunderba wrote:
         | I agree, but I think my biggest issue with LLMs (and a lot of
         | GenAI) is that they act as a massive accelerator for the WORST
         | (and unfortunately most common) type of human - the lazy one.
         | 
         | The signal-to-noise ratio just goes completely out of control.
         | 
         | https://journal.everypixel.com/ai-image-statistics
        
           | greenavocado wrote:
           | Exif watermark by the generators would solve 90% of the
           | problem in one fell swoop because lazy people won't remove it
        
             | minimaxir wrote:
             | Every image host and social media app automatically strips
             | EXIF data (for privacy reasons at minimum).
        
           | Der_Einzige wrote:
           | Sorry but the "lazy is bad" crowd is ludditism in another
           | form, and it's telling that a whole lot of very smart people
           | were passionate defenders of being lazy!
           | 
           | https://en.wikipedia.org/wiki/The_Human_Use_of_Human_Beings
           | 
           | https://en.wikipedia.org/wiki/Inventing_the_Future:_Postcapi.
           | ..
           | 
           | https://en.wikipedia.org/wiki/The_Right_to_Be_Lazy
           | 
           | https://en.wikipedia.org/wiki/In_Praise_of_Idleness_and_Othe.
           | .. (That's Bertrand Russell)
           | 
           | https://en.wikipedia.org/wiki/The_Abolition_of_Work
           | 
           | https://en.wikipedia.org/wiki/The_Society_of_the_Spectacle
           | 
           | https://en.wikipedia.org/wiki/Bonjour_paresse
           | 
           | AI systems are literally the most amazing technology on earth
           | for this exact reason. I am so glad that it is destroying the
           | minds of time thieves world-wide!
        
           | mhh__ wrote:
           | The people who are lazy but have taste will do well, then.
        
           | Adiqq wrote:
           | Isn't it expected that most, if not all, of the content will
           | be produced by AI/AGI in the near future? It won't matter
           | much, if you're lazy or not. It leads to the question, what
           | we'll do instead? People may want to be productive, but we're
           | observing in real-time how world is going shit for workers
           | and that's basically fact for many reasons.
           | 
           | One reason is that it's cheaper to use AI, even if the result
           | is poor. It doesn't have to be high quality, because most of
           | the time we don't care about quality, unless something
           | interests us. I wonder what kind of shift in power dynamics
           | will occur, but so far it looks just like many of us will
           | just lose a job. There's no UBI (or social credit proposed by
           | Douglas), salaries are low and not everyone lives in good
           | location, but corporations try to enforce RTO. Some will
           | simply get fired and won't be able to find a new job (that
           | won't be sustainable for personal budget, unless someone
           | already has low costs of living and is debt-free or has
           | somewhat wealthy family that will cover for you).
           | 
           | Well, maybe at least government will protect us? Low chance,
           | world is shifting right and it will get worse, once we start
           | to experience more and more results of global warming. I
           | don't see scenario, where world is becoming better place in
           | foreseeable future. We're trapped in society of achievement,
           | but soon we may be not able to deliver achievements, because
           | if business can get similar results for fraction of the price
           | (that is needed to hire human workers), then guess what will
           | happen?
           | 
           | These are sad times, full of depression and suffering. I hope
           | that some huge transformation in societies will happen soon
           | or that AI development slows down, so that some future
           | generation will have to deal with consequences (people will
           | prioritize saving their own and it won't be pretty, so it's
           | better to just pass it down like debt).
        
         | im_down_w_otp wrote:
         | This happens with every inane hype-cycle.
         | 
         | I suspect people don't particularly hate or despise LLMs per
         | se. They're probably reacting mostly to "tech industry" boom-
         | bust bullsh*tter/guru culture. Especially since the cycles seem
         | to burn increasingly hotter and brighter the less actual,
         | practical value they provide. Which is supremely annoying when
         | the second-order effect is having all the oxygen (e.g. capital)
         | sucked out of the room for pretty much anything else.
        
         | mhh__ wrote:
         | I can think of some runaway scenario's where LLMs are
         | definitely bad but, indeed, this particular line of criticism
         | is really just luddites longing for a world that probably
         | doesn't exist anymore.
         | 
         | These are the people who regulate and legislate for us, they
         | are the risk-adverse fools who would rather things be nice and
         | harmless lest they be bad but work.
         | 
         | Personally, I think my only serious ideology in this area is
         | that I am fundamentally biased towards the power of human
         | agency. I'd rather not need to, but in a (perhaps) Nietzschean
         | sense I view so-called AI as a force multiplier to totally
         | avoid the above people.
         | 
         | AI will enable the creative to be more concrete, and drag those
         | on the other end of the scale towards the normie mean. This is
         | of great relevance to the developing world too - AI may end up
         | a tool for enforcing western culture upon the rest of the world
         | but perhaps a force decorrelating it from the McKinsey's of
         | tall buildings in big cities.
        
       | throwanem wrote:
       | > I've heard from sources I trust that both Google Gemini and
       | Amazon Nova charge less than their energy costs for running
       | inference...
       | 
       | Then, several headings later:
       | 
       | > I have it on good authority that neither Google Gemini nor
       | Amazon Nova (two of the least expensive model providers) are
       | running prompts at a loss.
       | 
       | So...which is it?
        
         | simonw wrote:
         | Oh whoops! That's an embarrassing mistake, and I didn't realize
         | I had that point twice.
         | 
         | They're not running at a loss. I'll fix that.
        
           | cess11 wrote:
           | If they are subsidised they can make a profit while still not
           | making enough money to cover energy costs.
        
             | kgwgk wrote:
             | Subsidised by whom?
        
               | cess11 wrote:
               | E.g. tax payers.
        
               | kgwgk wrote:
               | Are tax payers subsiding that particular activity of
               | Google or Amazon? If they do, "they make enough money" to
               | cover costs. If they don't, how does it become profitable
               | if it doesn't even cover the cost of one of the inputs?
        
             | simonw wrote:
             | The tip I got about both Gemini and Nova is that the low
             | prices they are charging still cover their energy costs.
        
               | cess11 wrote:
               | OK!
        
       | pkoird wrote:
       | I'd love to read a semi-technical book on everything that we've
       | learned about what works and what does not on LLMs.
        
         | nkingsy wrote:
         | It would be out of date in months.
         | 
         | Things that didn't work 6 months ago do now. Things that don't
         | work now, who knows...
        
           | minimaxir wrote:
           | There are still some tropes from the GPT-3 days that are
           | fundamental to the construction of LLMs that affect how they
           | can be used and will not change unless they no longer are
           | trained to optimize for next-token-prediction (e.g.
           | hallucinations and the need for prompt engineering)
        
           | DoctorOetker wrote:
           | Do you mean performance that was missing in the past is now
           | routinely achieved?
           | 
           | Or do you actually mean that the same routines and data that
           | didn't work before suddenly work?
        
       | Animats wrote:
       | > Some of those GPT-4 models run on my laptop
       | 
       | That's an indication that most business-sized models won't need
       | some giant data center. This is going to be a cheap technology
       | most of the time. OpenAI is thus way overvalued.
        
         | slimsag wrote:
         | Unless the best models themselves are costly/hard to produce,
         | and there is not a company providing them to people free of
         | charge AND for commercial use.
        
         | shihab wrote:
         | The last OpenAI valuation I read about was 157 billion. I am
         | struggling to understand what justifies this. To me, it feels
         | like OpenAI is at best few months ahead of competitors in
         | _some_ areas. But even if I am underestimating the advantage,
         | it 's few years instead of few months, why does it matter? It's
         | not like AI companies are going to enjoy the first-mover
         | advantage internet giants had over the competition.
        
           | benreesman wrote:
           | Us skeptics believe that valuation prices in some form of
           | regulatory capture or other non-market factor.
           | 
           | The non-skeptical interpretation is that it's a threshold
           | function, a flat-out race with an unambiguous finish line. If
           | someone actually hit self-improving AGI first there's an
           | argument that no one would ever catch up.
        
             | com2kid wrote:
             | There are some really good books about wars between
             | cultures that have AGI and it always comes down to math -
             | whoever can get their hands on more compute faster wins.
        
               | api wrote:
               | This is also a strong argument for immigration,
               | particularly high-skill immigration. In the absence of
               | synthetic AGI whoever imports the most human AGI wins.
        
               | Jensson wrote:
               | Which suggests that total AGI compute doesn't matter that
               | much, as India isn't the world leader the amount of human
               | compute they posses would suggest then.
               | 
               | What matters is how you use the AGI, not how much you
               | have, with wrong or bad or limiting regulations it will
               | not lead anywhere.
        
           | datadrivenangel wrote:
           | It's justified if AGI is possible. If AGI is possible, then
           | the entire human economy stops making sense as far as money
           | goes, and 'owning' part of OpenAI gives you power.
           | 
           | That is of course, assuming AGI is possible and exponential,
           | and that marketshare goes to a single entity instead of a set
           | of entities. Lots of big assumptions. Seems like we're
           | heading towards a slow-lackluster singularity though.
        
             | philipkglass wrote:
             | _If AGI is possible, then the entire human economy stops
             | making sense as far as money goes, and 'owning' part of
             | OpenAI gives you power._
             | 
             | That's if AGI is possible _and not easily replicated_. If
             | AGI can be copied and /or re-developed like other software
             | then the value of owning OpenAI stock is more like owning
             | stock in copper producers or other commodity sector
             | companies. (It might even be a poorer investment. Even AGI
             | can't create copper atoms, so owners of real physical
             | resources could be in a better position in a post-human-
             | labor world.)
        
               | whatshisface wrote:
               | This belief comes from confusing the singularity (every
               | atom on Earth is converted into a giant image of Sam
               | Altman) with AGI (a store employee navigates a
               | confrontation with an unruly customer, then goes home and
               | wins at Super Mario).
        
               | fullstackchris wrote:
               | Exactly. I continually fail to see how "the entire human
               | economy ends" overnight with another human like agent out
               | there - especially if its confined to a server in the
               | first place - it can't even "go home" :)
        
               | baobabKoodaa wrote:
               | If I recall correctly, these terms were used more or less
               | interchangeably for a few decades, until 2020 or so, when
               | OpenAI started making actual progress towards AGI, and it
               | was clear that the type of AGI that could be imagined at
               | that point, would not be of the type that would produce
               | singularity.
        
               | AnimalMuppet wrote:
               | The GP said, "and exponential". If AGI is exponential,
               | then the first one will have a head start advantage that
               | compounds over time. That is going to be hard to
               | overcome.
        
               | philipkglass wrote:
               | I believe that AGI cannot be exponential for long because
               | any intelligent agent can only approach nature's limits
               | asymptotically. The first company with AGI will be about
               | as much ahead as, say, the first company with electrical
               | generators [1]. A lot of science fiction about a
               | technological singularity assumes that AGI will discover
               | and apply new physics to develop currently-believed-
               | impossible inventions, but I don't consider that
               | plausible myself. I believe that the discovery of new
               | physics will be intellectually satisfying but generally
               | inapplicable to industry, much like how solving the
               | cosmological lithium problem will be career-defining for
               | whoever does it but won't have any application to lithium
               | batteries.
               | 
               | https://en.wikipedia.org/wiki/Cosmological_lithium_proble
               | m
               | 
               | [1] https://en.wikipedia.org/wiki/Siemens#1847_to_1901
        
               | datadrivenangel wrote:
               | I don't recall editing my message, but HN can be wonky
               | sometimes. :)
               | 
               | Nothing is truly exponential for long, but the logistic
               | curve could be big enough to do almost anything if you
               | get imaginative. Without new physics, there are still
               | some places where we can do some amazing things with the
               | equivalent of several trillion dollars of applied R&D,
               | which AGI gets you.
        
               | philipkglass wrote:
               | I had to edit _my_ message just now because I was
               | actually unsure if you edited. Sorry for any
               | miscommunication.
        
               | terribleperson wrote:
               | This depends on what a hypothetical 'AGI' actually costs.
               | If a real AGI is achieved, but it costs more per unit of
               | work than a human does... it won't do anyone much good.
        
               | fullstackchris wrote:
               | Sure but think of the Higgs... how long that took for
               | just _one_ particle. You think an AGI, or even an ASI is
               | going to make an experimental effort like that go any bit
               | faster? Dream on!
               | 
               | It astounds me that people dont realize how much of this
               | cutting edge science stuff literally does NOT happen
               | overnight, and not even close to that; typically it takes
               | on the order of decades!
        
               | datadrivenangel wrote:
               | Science takes decades, but there are many places where we
               | could have more amazing things if we spent 10 times as
               | much on applied R&D and manufacturing. It wouldn't happen
               | overnight, but it will be transformative if people can
               | get access to much more automated R&D. We've seen a
               | proliferation in makers over the last few decades as
               | access to information is easier, and with better tools
               | individuals will be able to do even more.
               | 
               | My point being that even if Science ends today, we still
               | have a lot more engineering we can benefit from.
        
               | richardw wrote:
               | The first AGI will have such an advantage. It'll be the
               | first thing that is smart and tireless, can do anything
               | from continuously hacking enemy networks to trading
               | across all investment classes, to basically taking over
               | the news cycle on social media. It would print money and
               | power.
        
             | Terr_ wrote:
             | One strata in that assumption-heap to call out explicitly:
             | Assuming LLMs are an enabling route to AGI and not a dead-
             | end or supplemental feature.
        
             | api wrote:
             | If AGI is possible then that too becomes a commodity and we
             | experience a massive round of deflation in the cost of
             | everything not intrinsically rare. Land, food, rare
             | materials, energy, and anything requiring human labor is
             | expensive and everything else is almost free.
             | 
             | I don't see how OpenAI wouldn't crash and burn here. Given
             | the history of models it would be at most a year before
             | you'd have open AGI, then the horse is out of the barn and
             | the horse begins to self-improve. Pretty soon the horse is
             | a unicorn, then it's a Satyr, and so on.
             | 
             | (I am a near-term AGI skeptic BTW, but I could be wrong.)
             | 
             | OpenAI's valuation is a mixture of hype speculation and the
             | "golden boy" cult around Sam Altman. In the latter sense
             | it's similar to the golden boy cults around Elon Musk and
             | (politically) Donald Trump. To some extent these cults work
             | because they are self-fulfilling feedback loops: these
             | people raise tons of capital (economic or political)
             | because everyone knows they're going to raise tons of
             | capital so they raise tons of capital.
        
             | parpfish wrote:
             | Well, AGI would make the brainy information worker part of
             | the economy obsolete. Well still need the jobs that
             | interact with the physical world for quite a while. So...
             | all us HN types should get ready to work the mines or pick
             | vegetables
        
               | throwup238 wrote:
               | If we hit true AGI, physical labor won't be far behind
               | the knowledge workers. The first thing industrial
               | manufacturers will do is turn it towards designing
               | robotics, automating the design of factories, and
               | researching better electromechanical components like
               | synthetic muscle to replace human dexterity.
               | 
               | IMO we're going to hit the point where AI can work on
               | designing automation to replace physical labor before we
               | hit true AGI, much like we're seeing with coding.
        
             | hdjjhhvvhga wrote:
             | > If AGI is possible, then the entire human economy stops
             | making sense as far as money goes
             | 
             | I heard people on HN saying this (even without the money
             | condition) and I fail to grasp the reasoning behind it.
             | Suppose in a few years Altman announces a model, say o11,
             | that is supposedly AGI, and in several benchmarks it hits
             | over 90%. I don't believe it's possible with LLMs because
             | of their inherent limitations but let's assume it can solve
             | general tasks in a way similar to an average human.
             | 
             | Now, how come that "the entire human economy stops making
             | sense"? In order to eat, we need farmers, we need
             | construction workers, shops etc. As for white collar
             | workers, you will need a whole range of people to maintain
             | and further develop this AGI. So IMHO the opposite is true:
             | the human economy will work exactly as before but the job
             | market will continue to evolve withe people using AGI in a
             | similar way that they use LLMs now but probably with
             | greater confidence. (Or not.)
        
               | datadrivenangel wrote:
               | Why do we work? Ultimately, we work to live.* If the
               | value of our labor is determined by scarcity, then what
               | happens when productivity goes nearly infinite and the
               | scarcity goes away? We still have needs and wants, but
               | the current market will be completely inverted.
        
               | exe34 wrote:
               | If you think about all the people trying to automate away
               | farming, construction, transport/delivery - these people
               | doing the automation themselves get automated out first,
               | and the automation figures out how to do the rest. So a
               | fully robotic economy is not far off, if you can achieve
               | AGI.
        
               | SmooL wrote:
               | The thinking goes: - any job that can be done on a
               | computer is immediately outsourced to AI, since the AI is
               | smarter and cheaper than humans - humanoid robots are
               | built that are cheap to produce, using tech advances that
               | the AI discovered - any job that can be done by a human
               | is immediately outsourced to a robot, since the robot is
               | better/faster/stronger/cheaper than humans
        
             | UltraSane wrote:
             | If AGI is invented and the inventor tries to keep it secret
             | then everyone in the world will be trying to steal it. And
             | funding to independently create it would become effectively
             | unlimited once it has been proven possible, much like with
             | nuclear weapons.
        
             | robertlagrant wrote:
             | > If AGI is possible, then the entire human economy stops
             | making sense as far as money goes,
             | 
             | What does this mean in terms of making me coffee or
             | building houses?
        
               | com2kid wrote:
               | If we can simulate a full human intelligence at a
               | reasonable speed, we can simulate 100 of them and ask the
               | AGI to figure out how to make itself 10x faster.
               | 
               | Rinse and repeat.
               | 
               | That is exponential take off.
               | 
               | At the point where you have an army of AIs running at
               | 1000x human speed it can just ask it to design the
               | mechanisms for and write the code to make robots that
               | automate any possible physical task.
        
               | GOD_Over_Djinn wrote:
               | This sounds like magic, not science.
        
               | EMIRELADERO wrote:
               | What do you mean by this? Is there any fundamental
               | property of intelligence, physicality, or the universe,
               | that you think wouldn't let this work?
        
               | fullstackchris wrote:
               | Not OP but yes. Electron size vs band gap, computing
               | costs (in terms of electricity) any other raw materials
               | needed for that energy, etc... sigh... its physics,
               | always physics... what fundamental property of physics do
               | you think would let a vertical take off in intelligence
               | occur?
        
               | datadrivenangel wrote:
               | If you look at the rate of mathematical operations
               | conducted, we're already going hard vertical. Physics and
               | material limitations will slow that eventually as we
               | reach a marginal return on converting the planet to
               | computer chips, but we're in the singularity as proxy
               | measured by mathematical operations.
        
               | edflsafoiewq wrote:
               | There are about 8 billion human intelligences walking
               | around right now and they've got no idea how to begin
               | making even a stupid AGI, let alone a superhuman one.
               | Where does the idea that 100 more are going to help come
               | from?
        
               | torginus wrote:
               | Nothing and the hilarious thing is that the AI
               | figureheads admit that technology (as in defined by new
               | theorems produced and new code written), will do
               | pathetically little to move the needle on human happiness
               | forward.
               | 
               | The guy running Anthropic thinks the future is in
               | biotech, developing the cure to all diseases, eternal
               | youth etc.
               | 
               | Which is technology all right, but it's unclear to me how
               | these chatbots (or other AI systems) are the quickest way
               | to get there.
        
             | Animats wrote:
             | We may not need smarter AI. Just less stupid AI.
             | 
             | The big problem with LLMs is that most of the time they act
             | smart, and some of the time they do really, really dumb
             | things and don't notice. It's not the ceiling that's the
             | problem. It's the floor. Which is why, as the article
             | points out, "agents" aren't very useful yet. You can't
             | trust them to not screw up big-time.
        
             | torginus wrote:
             | I was thinking about how the economy has been actively
             | makes less sense and gets divorced more and more from
             | reality year after year, AI or not.
             | 
             | It's the simple fact that the ability of assets to generate
             | wealth has far outstripped the abiliy of individuals to
             | earn money by working.
             | 
             | Somehow real estate has become so expensive _everywhere_
             | that owning a shitty apartment is impossible for the vast
             | majority.
             | 
             | When the world's population was exploding during the 20th
             | century, housing prices were not a problem, yet somehow
             | nowadays, it's impossible to build affordable housing to
             | bring the prices down, though the population is stagnant or
             | growing slowly.
             | 
             | A company can be worth $1B if someone invests $10m in it
             | for 1% stake - where did the remaining $990m come from?
             | Likewise, the stock market is full of trillion-dollar
             | companies whose valuations beggar all explanation,
             | considering the sizes of the markets they are serving.
             | 
             | The rich elites are using the wealth to control access to
             | basic human needs (namely housing and healthcare) to
             | squeeze the working population for every drop of money.
             | Every wealth metric shows the 1% and the 1% of the 1%
             | control successively larger portions of the economic pie.
             | At this point money is ceasing to be a proxy for value and
             | is becoming a tool for population control.
             | 
             | And the weird thing is it didn't use to be nearly this bad
             | even a decade ago, and we can only guess how bad it will
             | get in a decade, AGI or not.
             | 
             | Anyway, I don't want to turn this into a fully-written
             | manifesto, but I have trouble expressing these ideas in a
             | concise manner.
        
               | nyarlathotep_ wrote:
               | > And the weird thing is it didn't use to be nearly this
               | bad even a decade ago, and we can only guess how bad it
               | will get in a decade, AGI or not.
               | 
               | The last 5 years have reflected a substantial decline in
               | QOL in the states; you don't even have to to look back
               | that far.
               | 
               | The coronacircus money-printing really accelerated the
               | decline.
        
               | ac29 wrote:
               | > Somehow real estate has become so expensive everywhere
               | that owning a shitty apartment is impossible for the vast
               | majority.
               | 
               | Approximately 2/3s of homes in the US are owner occupied.
        
               | orangecat wrote:
               | _Somehow real estate has become so expensive everywhere
               | that owning a shitty apartment is impossible for the vast
               | majority._
               | 
               | That's to be expected when governments forbid people from
               | building housing. The only thing I find surprising is
               | when people blame this on "capitalism".
        
           | throwpoaster wrote:
           | 157 billion implies about a 1% chance at dominating a 1.5
           | trillion market. Seems reasonable.
        
             | asqueella wrote:
             | 10%, no?
        
             | airstrike wrote:
             | that's 10% and who's to say that market is worth 1.5
             | trillion to begin with
        
               | cloverich wrote:
               | Market cap of apple, google, facebook.
        
           | criddell wrote:
           | > what justifies this
           | 
           | People are buying shares at $x because they believe they will
           | be able to sell them for more later. I don't think there's a
           | whole to more to it than that.
        
         | epicureanideal wrote:
         | And of course, as processors improve this becomes more and more
         | the case.
        
         | refulgentis wrote:
         | Been in the Mac ecosystem since 2008, love it, but there is,
         | and always has been, a tendency to talk about inevitabilities
         | from scaling bespoke, extremely expensive configurations, and
         | with LLMs, there's heavy eliding of what the user experience
         | is, beyond noting response generation speed in tokens/s.
         | 
         | They run on a laptop, yes - you might squeeze up to 10
         | token/sec out of a kinda sorta GPT-4 if you paid $5K plus for
         | an Apple laptop in the last 18 months.
         | 
         | And that's after you spent _2 minutes_ watching 1000 token*
         | prompt prefill at 10 tokens /sec.
         | 
         | Usually it'd be obvious this'd trickle down, things always do,
         | right?
         | 
         | But...Apple infamously has been stuck on 8GB of RAM in even
         | $1500 base models for years. I have 0 idea why, but my
         | intuition is RAM was ~doubling capacity at same cost every 3
         | years till early 2010s, then it mostly stalled out post 2015.
         | 
         | And regardless of any of the above, this absolutely _melts_
         | your battery. Like, your 16 hr battery life becomes 40 minutes,
         | no exaggeration.
         | 
         | I don't know why prefill (loading in your prompt) is so slow
         | for local LLMs, but it is. I assume if you have a bunch of
         | servers there's some caching you can do that works across all
         | prompts.
         | 
         | I expect the local LLM community to be roughly the same size it
         | is today 5 years from now.
         | 
         | * ~3 pages / ~750 words; what I expect is a conservative
         | average for prompt size when coding
        
           | lowercased wrote:
           | I have a 2023 mbp, and I get about 100-150 tok/sec locally
           | with lmstudio.
        
             | datadrivenangel wrote:
             | Which models?
        
               | refulgentis wrote:
               | For context, I got M2 Max MBP, 64 GB shared RAM, bought
               | it March 2023 for $5-6K.                 Llama 3.2 1.0B -
               | 650 t/s       Phi 3.5   3.8B - 60 t/s.       Llama 3.1
               | 8.0B - 37 t/s.       Mixtral  14.0B - 24 t/s.
               | 
               | Full GPU acceleration, using llama.cpp, just like LM
               | Studio.
        
               | lowercased wrote:
               | hugging-quants/llama-3.2-1b-instruct-q8_0-gguf - 100-150
               | tok/sec
               | 
               | second-state/llama-2-7b-chat-gguf net me around ~35
               | tok/sec
               | 
               | lmstudio-community/granite-3.1.-8b-instruct-GGUF - ~50
               | tok/sec
               | 
               | MBP M3 Max, 64g. - $3k
        
               | refulgentis wrote:
               | I'm not sure if you're pointing out any / all of these:
               | 
               | #1. It is possible to get an arbitrarily fast
               | tokens/second number, given you can pick model size.
               | 
               | #2. Llama 1B is roughly GPT-4.
               | 
               | #3. Given Llama 1B runs at 100 tokens/sec, and given
               | performance at a given model size has continued to
               | improve over the past 2 years, we can assume there will
               | eventually be a GPT-4 quality model at 1B.
               | 
               | On my end:
               | 
               | #1. Agreed.
               | 
               | #2. Vehemently disagree.
               | 
               | #3. TL;DR: I don't expect that, at least, the trend line
               | isn't steep enough for me to expect that in the next
               | decade.
        
         | mjburgess wrote:
         | I don't think openai's valuation comes from a data center bet
         | -- rather, I'd suppose, investors think it has a first-mover
         | advantage on model quality that it can (maybe?) attract some
         | buy-out interest or otherwise use in yet-to-be-specified
         | product lines.
         | 
         | However, it has been clear for a long time that meta are just
         | demolishing any competitor's moats, driving the whole megacorp
         | AI competition to razor thin margins.
         | 
         | It's a very welcome strategy from a consumer pov, but -- it has
         | to be said -- genius from a business pov. By deciding that no
         | one will win, it can prevent anyone leapfrogging them at a
         | relatively cheap price.
        
         | hyperpape wrote:
         | This seems like a non-sequitur unless you're assuming something
         | about the amount that people use models.
         | 
         | Most web servers can run some number of QPS on a developer
         | laptop, but AWS is a big business, because there are a heck of
         | a lot of QPS across all the servers.
        
         | thinkingemote wrote:
         | Most of the laptops that the models can run on today are in the
         | high end of dedicated bare metal servers. Most shared VM
         | servers are way below these laptops. Most people buying a new
         | laptop today won't be able to run them, most devs getting a
         | website up with a server won't be able to run them.
         | 
         | This means that the definitions of "laptop" and "server" are
         | dependent on use. We should instead talk about RAM, GPU and CPU
         | speed which is more useful and informative but less engaging
         | than "my laptop".
        
         | m3kw9 wrote:
         | The best models are always out of reach on desktops. You can
         | have ok models but AGI will come in a datacenter first
        
       | neom wrote:
       | "learned out about" - is that an Australian phraseology by
       | chance? Sounds Australian or British of some manner.
        
         | user982 wrote:
         | You can find out, you can learn about, but you can't learn out
         | about.
        
         | simonw wrote:
         | That was a very dumb typo in my title!
        
           | neom wrote:
           | I figured as much, although I wondered if you were going for
           | the kinda "he learn out about not pissing people off real
           | sharpish" kinda tone I've heard in Scotland before, but
           | wasn't sure. Big fan btw, happy new years Simon! :)
        
         | mjburgess wrote:
         | Good ear -- the use of 'out' as an abbreviation of anything is
         | a britishism.
         | 
         | Nowt, owt, -- nothing, anything
        
       | JaDogg wrote:
       | I think LLM web applications need a big red warning (non
       | interactive, I don't want more cookie dialogs) like in
       | cigarettes.
       | 
       | > LLM generated content need to be verified.
        
         | becquerel wrote:
         | Every LLM web app I have used has a disclaimer along these
         | lines prominently featured in the UI. Maybe the disclaimer
         | isn't bright red with gifs of flashing alarms, but the warnings
         | are there for the people who would pay attention to them in the
         | first place.
        
           | minimaxir wrote:
           | Unfortunately, even after 2 years of ChatGPT and countless
           | news stories about it, people still don't realize that LLMs
           | can be wrong.
           | 
           | There maybe should be a bright red flashing disclaimer at
           | this point.
        
       | Der_Einzige wrote:
       | RE: Slop:
       | 
       | Having Slop generations from an LLM is a choice. There are so
       | many tricks to make models genuinely creative just at the sampler
       | level alone.
       | 
       | https://github.com/sam-paech/antislop-sampler
       | 
       | https://openreview.net/forum?id=FBkpCyujtS
        
         | simonw wrote:
         | It doesn't matter how good the generated text is: it is still
         | slop if the recipient didn't request it and no human has
         | reviewed it.
        
           | Der_Einzige wrote:
           | By that definition machine to machine communication that
           | happens "organically" (like how humans do it, where they
           | sometimes strike up conversations unprompted with each other)
           | is "slop".
           | 
           | You're not seeing how the future of the world will develop.
        
             | simonw wrote:
             | If you ask me to read an unguided conversation between two
             | LLMs then yes, I'd consider that slop.
             | 
             | Some people might _like_ slop.
        
               | minimaxir wrote:
               | The rise of the famous obvious Facebook AI slop indicates
               | that some demographics love it.
        
         | orbital-decay wrote:
         | This won't solve anything. There's a myriad of sampling
         | strategies, and they all have the same issue: samplers are
         | dumb. They have no access to the semantics of what they're
         | sampling. As a result, things like min-p or XTC will either
         | overshoot or undershoot as they can't differentiate between the
         | situations. For the same reason, samplers like DRY can't solve
         | repetition issues.
         | 
         | Slop is over-representation of model's stereotypes and lack of
         | prediction variety in cases that need it. Modern models are
         | insufficiently random when it's required. It's not just
         | specific words or idioms, it's _concepts_ on very different
         | abstraction levels, from words to sentence patterns to entire
         | literary devices. You can 't fix issues that appear on the
         | latent level by working with tokens. The antislop link you give
         | seems particularly misguided, trying to solve an NLP task
         | programmatically.
         | 
         | Research like [1] suggests algorithms like PPO as one of the
         | possible culprits in the lack of variety, as they can filter
         | out entire token trajectories. Another possible reason is
         | training on outputs from the previous models and insufficient
         | filtering of web scraping results.
         | 
         | And of course, prediction variety != creativity, although it's
         | certainly a factor. Creativity is an ill-defined term like many
         | in these discussions.
         | 
         | [1] https://arxiv.org/abs/2406.05587
        
       | draw_down wrote:
       | I agree the criticism is poor; it's often very lazy. There are
       | currently a lot of dog-brain "wrap a LLM around it" products,
       | which are worthy of scorn. Much of the lazy criticism is pointing
       | at such products and therefore writing off the whole endeavor.
       | 
       | But that doesn't necessarily reflect the potential of the
       | underlying technology, which is developing rapidly. Websites were
       | goofy and pointless until Amazon came around (or Yahoo or
       | whatever you prefer).
       | 
       | I guess potential isn't very exciting or interesting on its own.
        
       | dtquad wrote:
       | What is the current status on pushing "reasoning" down to
       | latent/neural space? Seems like a vaste of tokens to let a model
       | converse with itself especially when this internal monologue
       | often has very little to do with the final output so it's not
       | useful as a log of how the final output was derived.
        
         | dmd wrote:
         | See https://news.ycombinator.com/item?id=42555320
        
       | adsharma wrote:
       | In spite of all this progress, I can't find LLMs that solve
       | simple tasks like:
       | 
       | Here is my resume. Make it look nice (some design hints).
       | 
       | They can spit html and css, but not Google doc.
       | 
       | On the other hand, Google results are dominated by SEO spam. You
       | can probably find one usable result on page 10.
       | 
       | The problem is not technology. It's a business model that can
       | support the humans feeding data into the LLM.
        
         | Alex-Programs wrote:
         | Why would they be able to output a Google doc? It's a
         | proprietary format. The closest thing would be rich text format
         | to copy paste.
        
           | vikramkr wrote:
           | That proprietary format is owned by a company associated with
           | folks who won two nobel prizes for AI related work this year
           | and the employer at the time of the researchers who wrote the
           | attention is all you need paper and also the owner of a
           | search engine with access to like, all the data. Doesn't seem
           | unreasonable lol
        
         | logicchains wrote:
         | They can spit out LaTeX, and a PDF from that is going to look
         | much nicer than a Google doc (and display the same everywhere).
         | As an added bonus, the recruiter can't randomly rewrite parts
         | of it (at least not so easily).
        
           | nox101 wrote:
           | The recuiter isn't going to print out your resume. They're
           | going to read in their computer or iPad or phone.
        
             | trenchgun wrote:
             | For sure they will read a pdf and not a google doc.
        
         | Gooblebrai wrote:
         | > They can spit html and css, but not Google doc.
         | 
         | Wow. At this stage, I think people are just searching for
         | excuses to complain about anything that the LLM does NOT do.
        
       | henning wrote:
       | Spookily good at writing code? LLMs frequently hallucinate broken
       | nonsense shit when I use them.
       | 
       | Recognize what they do well (generate simple code in popular
       | languages) while acknowledging where they are weak (non-trivial
       | algorithms, any novel code situation the LLM hasn't seen before,
       | less popular languages).
        
         | simonw wrote:
         | Did you try learning HOW to get good code out of them?
         | 
         | As with all things LLM there's a whole lot of undocumented and
         | under appreciated depth to getting decent results.
         | 
         | Code hallucinations are also the least damaging type of
         | hallucinations, because you get fact checking for free: if you
         | run the code and get an error you know there's a problem.
         | 
         | A lot of the time I find pasting that error message back into
         | the LLM gets me a revision that fixes the problem.
        
           | lolinder wrote:
           | > Code hallucinations are also the least damaging type of
           | hallucinations, because you get fact checking for free: if
           | you run the code and get an error you know there's a problem.
           | 
           | This is great when the error is a thrown exception, but less
           | great when the error is a subtle logic bug that only strikes
           | in some subset of cases. For trivial code that only you will
           | ever run this is probably not a big deal--you'll just fix it
           | later when you see it--but for code that must run unattended
           | in business-critical cases it's a totally different story.
           | 
           | I've personally seen a dramatic increase in sloppy logic that
           | _looks_ right coming from previously-reliable programmers as
           | they 've adopted LLMs. This isn't an imaginary threat, it's
           | something I now have to actively think about in code reviews.
        
             | simonw wrote:
             | Yeah, the other skill you need to develop to make the most
             | of AI-assisted programming is _really good_ manual QA.
        
               | lolinder wrote:
               | Have you found that to be a good trade-off for large-
               | scale projects?
               | 
               | Where I'm at right now with LLMs is that I find them to
               | be very helpful for greenfield personal projects.
               | Eliminating the blank canvas problem is huge for my
               | productivity on side projects, and they excel at getting
               | projects scaffolded and off the ground.
               | 
               | But as one of the lead engineers working on a million+
               | line, 10+ year-old codebase, I've yet to see any
               | substantial benefit come from myself or anyone else using
               | LLMs to generate code. For every story where someone
               | found time saved, we have a near miss where flawed code
               | almost made it in or (more commonly) someone eventually
               | deciding it was a waste of time to try because the model
               | just wasn't getting it.
               | 
               | Getting better at manual QA would help, but given the
               | number of times where we just give up in the end I'm not
               | sure that would be worth the trade-off over just
               | discouraging the use of LLMs altogether.
               | 
               | Have you found these things to actually work on large,
               | old codebases given the right context? Or has your
               | success likewise been mostly on small things?
        
               | simonw wrote:
               | I use them successfully on larger project all the time.
               | 
               | "Here's some example JavaScript code that sends an email
               | through the SendGrid REST API. Write me a python function
               | for sending an email that accepts an email address,
               | subject, path to a Jinja template and a dictionary of
               | template context. It should return true or false for if
               | the email was sent without errors, and log any error
               | messages to stderr"
               | 
               | That prompt is equally effective for a project that's 500
               | lines or 5,000,000 lines of code.
               | 
               | I also use them for code spelunking - you can pipe quite
               | a lot of code into Gemini and ask questions like "which
               | modules handle incoming API request validation?" - that's
               | why I built https://github.com/simonw/files-to-prompt
        
               | gre wrote:
               | I had some success converting a react app with classes to
               | use hooks instead. Also asking it to handle edge cases,
               | like spaces in a filename in a bash script--this fixes
               | some easy problems that might have come up. The corollary
               | here is that pointing out specific problems or mentioning
               | the right jargon will produce better code than just
               | asking for the basic task.
               | 
               | It's very bad at Factor but pretty good at naming things,
               | sometimes requiring some extra prompting. [generate 25
               | possible names for this variable...]
        
               | switch007 wrote:
               | QA are going to be told to use AI too
               | 
               | (Seems every job is fair game according to CTOs. Well,
               | except theirs)
        
             | polishdude20 wrote:
             | When they spit out these subtle bugs, are you promoting the
             | LLM to watch our for that particular bug? I wonder if it
             | just needs a vir more guidance in more explicit terms
        
               | lolinder wrote:
               | At a certain point it becomes more work to prompt the LLM
               | with each and every edge case than it is to just write
               | the dang code.
               | 
               | I work out what the edge cases are by writing and
               | rewriting the code. It's in the process of shaping it
               | that I see where things might go wrong. If an LLM can't
               | do that on its own it isn't of much value for anything
               | complicated.
        
           | joelanman wrote:
           | > if you run the code and get an error you know there's a
           | problem.
           | 
           | well, sometimes - other times it'll be wrong with no error,
           | or insecure, or inaccessible, and so on
        
           | xyzsparetimexyz wrote:
           | Is there more to getting 'good' at them then just copying
           | error messages back in? Like, how do I get them to reason
           | about e.g. whether a data structure compression method makes
           | sense?
        
           | AnimalMuppet wrote:
           | > Did you try learning HOW to get good code out of them?
           | 
           | That is at least somewhat a valid point. Good workers know
           | how to get the best out of their tools. And yet, _good_ tools
           | accommodate how their users work, instead of expecting the
           | user to accommodate how the tool works.
           | 
           | One could also say that programmers were sold a misleading
           | bill of goods about how LLMs would work. From what they were
           | told, they shouldn't _have_ to learn how to get the best out
           | of LLMs - LLMs were AI, on the way to AGI, and would just
           | give you everything you needed from a simple prompt.
        
             | simonw wrote:
             | Yeah, that's one of the biggest misconceptions I've been
             | trying to push back against.
             | 
             | LLMs are power-user tools. They're nowhere near as easy to
             | use as they look (or as their marketing would have you
             | believe).
             | 
             | Learning to get great results out of them takes a
             | significant amount of work.
        
           | henning wrote:
           | Like all AI simps, your blanket response to pointing out
           | flaws is to tell me to do more prompt engineering and then
           | dismiss the issue entirely. In the time it takes me to coax
           | the model to do the thing I was told it knows how to do, I
           | could just do the task myself. Your examples of LLM code
           | generation are simple, easy to specify, self-contained
           | applications that are not representative of software you can
           | actually build a business on. Please do something your
           | beloved LLMs can't and come up with an original idea.
        
             | minimaxir wrote:
             | > not representative of software you can actually build a
             | business on
             | 
             | The only people pushing that you can BUILD AN APP WITHOUT
             | WRITING A LINE OF CODE are the Twitter AI hypesters. Simon
             | doesn't assert anything of the sort.
             | 
             | LLMs are more-than-sufficient for code snippets and small
             | self-contained apps, but they are indeed far from replacing
             | software engineers.
        
             | phantompeace wrote:
             | Like all stubborn anti-AI know-it-alls, you sound like
             | you've tried a couple of times to do something and have
             | decided to label all LLMs with the same brush.
             | 
             | What models have you tried, and what are you trying to do
             | with them? Give us an example prompt too so we can see how
             | you're coaxing it so we can rule out skill issue.
             | 
             | And a big strength LLMs have is summarizing things - I'd
             | like to see you summarize the latest 10 arxiv papers
             | relating to prompt engineering and produce a report geared
             | towards non-techies. And do this every 30 mins please. Also
             | produce social media threads with that info. Is this a task
             | you could do yourself, better than LLMs?
        
               | henning wrote:
               | Due to unexpected capacity constraints, Claude is unable
               | to reply to this message.
        
               | voidhorse wrote:
               | > And a big strength LLMs have is summarizing things -
               | I'd like to see you summarize the latest 10 arxiv papers
               | relating to prompt engineering and produce a report
               | geared towards non-techies. And do this every 30 mins
               | please. Also produce social media threads with that info.
               | Is this a task you could do yourself, better than LLMs?
               | 
               | Right, but this is the part that is silly and sort of
               | disingenuous and I think built upon a weird understanding
               | of value and productivity.
               | 
               | Doing _more_ constantly isn 't inherently valuable. If
               | one human writes a magnificently crafted summary of those
               | papers _once_ and it is promulgated across channels
               | effectively, this is both better and more economical than
               | having an LLM compute one (slightly incorrect) summary
               | for each individual on demand. In fact, all the LLM does
               | in this case is increase the amount of possible lower
               | quality noise in the space. The one edge an LLM might
               | have at this stage is to generate a summary that accounts
               | for more recent information, thereby getting around the
               | inevitable gradual  "out of dateness" of human authored
               | summaries at time T, but even then, this is not great if
               | the trade off is to pollute the space with a. bunch of
               | ever so slightly different variants of the same text.
               | It's such a weird, warped idea of what productivity is,
               | it's basically the lazy middle-manager's idea of what it
               | means to be productive. We need to remember that not all
               | processes are reducible to their outputs--sometimes the
               | process is the point, not the immediate output (e.g.
               | education).
        
         | th0ma5 wrote:
         | Simon gets one thing working for one task and assumes everyone
         | can do the same for everything. That's the trick is that he has
         | no idea how the failures happen or how to maintain actual
         | working systems.
        
       | k2xl wrote:
       | Something not mentioned is AI generated music. Suno's development
       | this year is impressive. Unclear what this will mean for music
       | artists over next few years.
        
         | simonw wrote:
         | Yeah, this year I decided to just focus on LLMs - I didn't
         | touch on any of the image or music generation advances either.
         | I haven't been following those closely enough to have
         | particularly useful things to say about them.
        
         | fullstackchris wrote:
         | Very clear; I like buying music produced by people who play
         | instruments.
        
       | antirez wrote:
       | About "people still thinking LLMs are quite useless", I still
       | believe that the problem is that most people are exposed to
       | ChatGPT 4o that at this point for my use case (programming /
       | design partner) is basically a useless toy. And I guess that in
       | tech many folks try LLMs for the same use cases. Try Claude
       | Sonnet 3.5 (not Haiku!) and tell me if, while still flawed, is
       | not helpful.
       | 
       | But there is more: a key thing with LLMs is that their ability to
       | help, as a tool, changes vastly based on your communication
       | ability. The prompt is the king to make those models 10x better
       | than they are with the lazy one-liner question. Drop your files
       | in the context window; ask very precise questions explaining the
       | background. They work great to explore what is at the borders of
       | your knowledge. They are also great at doing boring tasks for
       | which you can provide perfect guidance (but that still would take
       | you hours). The best LLMs (in my case _just_ Claude Sonnet 3.5, I
       | must admit) out there are able to accelerate you.
        
         | wslh wrote:
         | Right, in simpler terms: The measure of LLMs success is how
         | effectively they help you achieve your goal faster.
        
           | antirez wrote:
           | Exactly, and right now the LLMs acceleration effect is _a
           | tool_ , not "give me the final solution". Even people that
           | can't code, using LLMs to build applications from scratch,
           | still have this tool mindset. This is why they can use them
           | effectively: they don't stop at the first failed solution;
           | they provide hints to the LLM, test the code, try to figure
           | what's the problem (also with the LLM help), and so forth.
           | It's a matter of mindset.
        
         | hdjjhhvvhga wrote:
         | While Claude Sonnet is superior than 4o for most my use cases,
         | there are still occasionally some specific tasks where it
         | performs slightly better.
        
           | antirez wrote:
           | Probably. But statistically to work with 4o is a lose of time
           | for me. LLMs is like an investment: you write the prompts,
           | you "work" with them. If the LLM is too weak, this is a lose
           | of time. You need to have a return on the investment that is
           | positive. With ChatGPT 4o / o1 most of the times for me the
           | investment of time has almost zero return. Before Claude
           | Sonnet 3.5 I already had a ChatGPT PRO account but never used
           | it for coding since it was most of the times useless if not
           | for throw away scripts that I didn't want to do myself or as
           | a stack overflow replacement for trivial stuff. Now it's
           | different.
        
             | airstrike wrote:
             | This mirrors my experience 100%. I'm not even sure why I
             | still pay for OpenAI at this point. Claude 3.5 is just
             | incredibly superior. And I totally agree on the point about
             | dropping in context and asking very specific questions.
             | I've had Claude pinpoint a bug in a 2k LOC module that I
             | was struggling to find the cause for. After wasting a lot
             | of time on it on my own, I thought "what the heck, maybe
             | Claude can figure it out" and it did. It's objectively
             | useful, even if flawed sometimes.
        
             | d0mine wrote:
             | why "lose of time" instead of "loss of time" Is it a typo
             | or fingerprinting?
        
           | tootie wrote:
           | Like what? Claude has become my go-to, but I find that it's
           | wrong enough often enough that I really can't trust it for
           | anything. If it says something I have to go dig through it's
           | citations very carefully.
        
         | brookst wrote:
         | I'm surprised you only have one use case. I use LLMs to
         | research travel, adjust recipes, check biographies and book
         | reviews, and many many more things.
        
         | minimaxir wrote:
         | > Claude Sonnet 3.5 (not Haiku!)
         | 
         | A very big surprise is just _how_ much better Sonnet 3.5 is
         | than Haiku. Even the confusingly-more-expensive-Haiku-variant
         | Haiku 3.5 that 's more recent than Sonnet 3.5 is still much
         | worse.
        
         | mhh__ wrote:
         | Hopefully things have narrowed but you can see from the trends
         | data just how few people (API may be a different story) use
         | claude relative to chatgpt.
        
           | minimaxir wrote:
           | Brand awareness is a hell of a drug.
        
             | mhh__ wrote:
             | Indeed, although I find myself reaching for o1 more than
             | Claude for matters other than programming, solely because
             | it has better LaTeX (...)
        
         | dxbydt wrote:
         | > best LLMs are able to accelerate you
         | 
         | https://www2.math.upenn.edu/~ghrist/preprints/LAEF.pdf - this
         | math textbook was written in just 55 days!
         | 
         | Paraphrasing the acknowledgements -
         | 
         | ...Begun November 4, 2024, published December 28, 2024.
         | 
         | ...assisted by Claude 3.5 sonnet, trained on my previous
         | books...
         | 
         | ...puzzles co-created by the author and Claude
         | 
         | ...GPT-4o and -o1 were useful in latex configurations...doing
         | proof-reading.
         | 
         | ...Gemini Experimental 1206 was an especially good proof-reader
         | 
         | ...Exercises were generated with the help of Claude and may
         | have errors.
         | 
         | ...project was impossible without the creative labors of Claude
         | 
         | The obvious comparison is to the classic Strang
         | https://math.mit.edu/~gs/everyone/ which took several *
         | _years*_ to conceptualize, write, peer review, revise and
         | publish.
         | 
         | Ok maybe Strang isn't your cup of tea, :%s/Strang/Halmos/g ,
         | :%s/Strang/Lipschutz/g, :%s/Strang/Hefferon/g,
         | :%s/Strang/Larson/g ...
         | 
         | Working through the exercises in this new LLMbook, I'm
         | thinking...maybe this isn't going to stand the test of time.
         | Maybe acceleration is not so hot after all.
        
           | datadrivenangel wrote:
           | Going faster isn't good if the quality drops enough that
           | overall productivity decreases... Infinite slop is only a
           | good thing for pigs.
        
             | cruffle_duffle wrote:
             | Just use ChatGPT to summarize its own output. It's like
             | running your JPEG back through the JPEG compressor again!
        
           | kianN wrote:
           | ^ This perfectly encapsulates the story I see every time
           | someone digs into the details of any llm generated or
           | assisted content that has any level of complexity.
           | 
           | Great on the surface but lacks any depth, cohesive, or
           | substance
        
           | pton_xd wrote:
           | "The story of linear algebra begins with systems of
           | equations, each line describing a constraint or boundary
           | traced upon abstract space. These simplest mathematical
           | models of limitation -- each equation binding variables in
           | measured proportion -- conjoin to shape the realm of possible
           | solutions. When several such constraints act in concert,
           | their collaboration yields three possible fates: no solution
           | survives their collective force; exactly one point satisfies
           | all bounds; or infinite possibilities trace curves and planes
           | through the space of satisfaction. This trichotomy -- of
           | emptiness, uniqueness, and infinity -- echoes through all of
           | linear algebra, appearing in increasingly sophisticated forms
           | as our understanding deepens."
           | 
           | Maybe I'm not the target audience, but... that really doesn't
           | make me interested in continuing to read.
        
             | jpc0 wrote:
             | Even putting it here is annoying to me... Those are a lot
             | of words saying nothing that I just spend time reading.
             | 
             | I'm agreeing with you.
        
           | mooreds wrote:
           | I started a book about CIAM (customer identity and access
           | management) using Claude to help outline a chapter. I'd edit
           | and refine the outline to make sure it covered everything.
           | 
           | Then I'd have Claude create text. I'd then edit/refine each
           | chapter's text.
           | 
           | Wow, was it unpleasant. It was kinda cool to see all the
           | words put together, but editing the output was a slog.
           | 
           | It's bad enough editing your own writing, but for some reason
           | this was even worse.
        
         | ninth_ant wrote:
         | I think a lot of the confusion is in how we approach LLMs.
         | Perhaps stemming from the over-broad term "AI".
         | 
         | There are certain classes of problems that LLMs are good at.
         | Accurately regurgitating all accumulated world knowledge ever
         | is not one, so don't ask a language model to diagnose your
         | medical condition or choose a political candidate.
         | 
         | But _do_ ask them to perform suitable tasks for a language
         | model! Every day by automation I feed in the hourly weather
         | forecast my home ollama server and it builds me a nice readable
         | concise weather report. It's super cool!
         | 
         | There are lots of cases like this where you can give an LLM
         | reliable data and ask it to do a language related task and it
         | will do an excellent job of it.
         | 
         | If nothing else it's an extremely useful computer-human
         | interface.
        
           | rrix2 wrote:
           | > Every day by automation I feed in the hourly weather
           | forecast my home ollama server and it builds me a nice
           | readable concise weather report.
           | 
           | not to dissuade you from a thing you find useful but are you
           | aware that the national weather service produces an Area
           | Forecast Discussion product in each local NWS office daily or
           | more often that accomplishes this with human meteorologists
           | and clickable jargon glossary?
           | 
           | https://forecast.weather.gov/product.php?site=SEW&issuedby=S.
           | ..
        
             | ninth_ant wrote:
             | Doesn't dissuade me at all, that's a really neat service.
             | I'm not American though, and even if my own country had a
             | similar service I still enjoying tuning the results to
             | focus on what I'm interested in. And it was just an example
             | of the kinds of computer-human interfaces that are newly
             | possible from this technology.
             | 
             | Anytime you have data and want it explained in a casual way
             | -- and it's not mission critical to be extremely precise --
             | LLMs are going to be a good option to consider.
             | 
             | More useful AGI-like behaviours may be enabled by combining
             | LLMs with other technologies down the line, but we
             | shouldn't try to pretend that LLMs can do everything nor
             | are they useless.
        
           | pixl97 wrote:
           | >don't ask a language model to diagnose your medical
           | condition
           | 
           | Honestly they are very decent at it if you give them accurate
           | information in which to make the diagnosis. The typical
           | problem people have is being unable to feed accurate
           | information to the model. They'll cut out parts they don't
           | want to think about or not put full test results in for
           | consideration.
        
         | 1oooqooq wrote:
         | yeah, they save as much time as finding a template with a good
         | old search and using it.
        
         | uludag wrote:
         | I don't think people finding LLMs useless is a good
         | representation of the general sentiment though. I feel that
         | more than anything, people are annoyed at LLM slop. Someone
         | uses an LLM too much to write code, they create "slop," which
         | ends up making things worse.
        
           | gre wrote:
           | Yes but then they can prompt it to golf the code and most of
           | the slop goes away. This sometimes breaks the code.
        
           | antirez wrote:
           | Unfortunately complex tools will be misused by part of the
           | population. There is no easy escape from that in the
           | modernity of possibilities. Look at the Internet itself.
        
         | cruffle_duffle wrote:
         | To get the most out of them you have to provide context. Treat
         | these models like some kind of eager beaver junior engineer who
         | wants to jump in and write code without asking questions. Force
         | it to ask questions (eg: "do not write code yet, please restate
         | my requirements to make sure we are in alignment. Are there any
         | extra bits of context or information that would help? I will
         | tell you when to write code")
         | 
         | If your model / chat app has the ability to always inject some
         | kind of pre-prompt make sure to add something like "please do
         | not jump to writing code. If this was a coding interview and
         | you jumped to writing code without asking questions and
         | clarifying requirements you'd fail".
         | 
         | At the top of all your source files include a comment with the
         | file name and path. If you have a project on one of these
         | services add an artifact that is the directory tree ("tree
         | ---gitignore" is my goto). This helps "unaided" chats get a
         | sense of what documents they are looking at.
         | 
         | And also, it's a professional bullshitter so don't trust it
         | with large scale code changes that rely on some language /
         | library feature you don't have personal experience with. It can
         | send you down a path where the entire assumption that something
         | was possible turns out to be false.
         | 
         | Does it seek like a lot of work? Yes. Am I actually more
         | productive with the tool than without? Probably. But it sure as
         | shit isn't "free" in terms of time spent providing context. I
         | think the more I use these models, the more I get a sense of
         | what it is good at and what is going to be a waste of time.
         | 
         | Long story short, prompting is everything. These things aren't
         | mind readers (and worse they forget everything in each new
         | session)
        
         | jsheard wrote:
         | I swear these goalposts keep getting moved, I remember being
         | told that GPT3.5 is a useless toy but the paid GPT4 is
         | lifechanging, and now that GPT4 is free I'm told that it's a
         | useless toy but paid o1 or paid Sonnet are lifechanging.
         | Looking forward to o1 and Sonnet becoming useless toys, unlike
         | the lifechanging o3.
        
           | aetherson wrote:
           | You will also be dismayed to hear that a 2011 iPhone is no
           | longer state-of-the-art, and indeed can't run most modern
           | apps.
        
             | jpc0 wrote:
             | GPT4 is a 13 year old technology? Compared to o1 and Sonnet
             | 3.5?
             | 
             | If someone told me an iPhone 4 is terrible but an iPhone 5
             | would definitely serve my needs, then when I get an iPhone
             | 5 they say the same of the 6 you really want me to believe
             | them a second time? Then a third time? Then a 4th? In the
             | mean time my time and money is wasted?
        
             | scubbo wrote:
             | Holy false-equivalency, Batman! The definitions of "useless
             | toy / lifechanging tool" are _not_ changing over time (or,
             | at least, not over the timescale being explored here),
             | whereas the expectations and requirements of processing
             | power of a phone are.
        
         | qsort wrote:
         | I believe it's more frustration directed at the mismatch
         | between marketing and reality, combined with the general _well
         | deserved_ growing hatred for SV culture, and, more broadly,
         | software engineers. The sentiment would be completely different
         | if the entire industry marketed themselves like the helpful
         | tools they are rather than the second coming of Christ they
         | aren 't. This distinction is hard to make on "fast food" forums
         | like this one.
         | 
         | If you aren't a coder, it's hard to find much utility in
         | "Google, but it burns a tree whenever you make an API call, and
         | everything it tells you might be wrong". I for one have never
         | used it for anything else. It just hasn't ever come up.
         | 
         | It's great at cheating on homework, kids love GPTs. It's great
         | at cheating in general, in interviews for instance. Or at
         | ruining Christmas, after this year's LLM debacle it's unclear
         | if we'll have another edition of Advent of Code. None of this
         | is the technology's fault, of course, you could say the same
         | about the Internet, phones or what have you, but it's hardly a
         | point in favor either.
         | 
         | And if you are a coder, models like Claude actually do help
         | you, but you have to monitor their output and thoroughly test
         | whatever comes out of them, a far cry from the promises of
         | complete automation and insane productivity gains.
         | 
         | If you are only a consumer of this technology, like the vast
         | majority of us here, there isn't that much of an upside in
         | being an early adopter. I'll sit and wait, slowly integrating
         | new technology in my workflow if and when it makes sense to do
         | so.
         | 
         | Happy new year, I guess.
        
         | duped wrote:
         | > Try Claude Sonnet 3.5 (not Haiku!) and tell me if, while
         | still flawed, is not helpful.
         | 
         | It's not _as_ helpful as Google was ten years ago. It 's more
         | helpful than Google today, because Google search has slowly
         | been corrupted by garbage SEO and other LLM spam, including
         | their own suggestions.
        
           | ChicagoDave wrote:
           | Claude Sonnet 3.5 can write whole React applications with
           | proper contextual clues and some minor iterations. Google has
           | never coded for you.
           | 
           | I've written two large applications and about a dozen smaller
           | ones using Claude as an assistant.
           | 
           | I'm a terrible front-end developer and almost none of that
           | work was possible without Claude. The API and AWS deployment
           | were sped up tremendously.
           | 
           | I've created unit tests and I've read through the resulting
           | code and it's very clean. One of my core pre-prompt
           | requirements has always been to follow domain-driven design
           | principles, something a novice would never understand.
           | 
           | I also start with design principles and a checklist that
           | Claude is excellent at providing.
           | 
           | My only complaint is you only have a 3-4 hour window before
           | you're cutoff for a few hours.
           | 
           | And needing an enterprise agreement to have a walled garden
           | for proprietary purposes.
           | 
           | I was not a fan in Q1. Q2 improved. Q3 was a massive leap
           | forward.
        
             | duped wrote:
             | I've never really used Claude for writing code, becuase I'm
             | not really bottlenecked by that problem. I have used it
             | quite a bit for asking questions about what code to write
             | and it's almost always wrong (usually in subtle ways that
             | would trick someone with little experience).
             | 
             | Maybe it was overtrained on react sources, but for me it's
             | pretty useless.
             | 
             | The big annoyance for me is it just makes up APIs that
             | don't exist. While that's useful for suggesting to me what
             | APIs I should add to my own code, it's really pointless if
             | I ask a question like "using libfoo how do I bar" and it
             | tells me "call the doBar() function" which does not exist.
        
           | bdangubic wrote:
           | comparing google to claude 3.5 is like comparing tesla s
           | plaid with a horse
        
           | emptiestplace wrote:
           | What a hilariously absurd statement. You might want to
           | actually try it.
        
         | swalsh wrote:
         | I'm a big believer in Claude. I've accomplished some huge
         | productivity gains by leveraging it. That said, I can see
         | places where the models are strong and weak. If you're doing
         | react, or python. These models are incredible. C#, C++ they're
         | not terrible. Rust though, it's not great. If your experience
         | is exclusively trying to use it to write Rust, it doesn't
         | matter if you're using o1, Claude or anything else. It's just
         | not great at it yet.
        
         | mvkel wrote:
         | I'm surprised at the description that it's "useless" as a
         | programming / design partner. Even if it doesn't make "elegant"
         | code (whatever that means), it's the difference between an app
         | existing at all, or not.
         | 
         | I built and shipped a Swift app to the App Store, currently
         | generating $10,200 in MRR, exclusively using LLMs.
         | 
         | I wouldn't describe myself as a programmer, and didn't plan to
         | ever build an app, mostly because in the attempts I made, I'd
         | get stuck and couldn't google my way out.
         | 
         | LLMs are the great un-stickers. For that reason per se, they
         | are incredibly useful.
        
           | raydev wrote:
           | Which service/LLM performed the best for you?
        
           | egometry wrote:
           | To the un-sticking point: it's also great at letting people
           | ask questions without being perceived as dumb
           | 
           | Tragically - admitting ignorance, even with the desire to
           | learn, often has negative social reprocussions
        
             | simonw wrote:
             | Asking "stupid" questions without fear of judgement is
             | legit one of my favorite personal applications of LLMs.
        
           | HarHarVeryFunny wrote:
           | Did you need a Mac for that, or is it possible to use Linux
           | to develop a Swift app targeting iOS?
           | 
           | Would you mind sharing which app you released?
        
         | FooBarWidget wrote:
         | Why do people have such narrow views on what makes LLMs useful?
         | I use them for basically everything.
         | 
         | My son throwing an irrational tantrum at the amusement park and
         | I can't figure out why he's like that (he won't tell me or he
         | doesn't know himself either) or what I should do? I feed Claude
         | all the facts of what happened that day and ask for advice.
         | Even if I don't agree with the advice, at the very least the
         | analysis helps me understand/hypothesize what's going on with
         | him. Sure beats having to wait until Monday to call up
         | professionals. And in my experience, those professionals don't
         | do a better job of giving me advice than Claude does.
         | 
         | It's weekend, my wife is sick, the general practitioner is
         | closed, the emergency weekend line has 35 people in the queue,
         | and I want some quick half-assed medical guidance that while I
         | know might not be 100% reliable, is still better than nothing
         | for the next 2 hours? Feed all the symptoms and facts to
         | Claude/ChatGPT and it does an okay job a lot of the time.
         | 
         | I've been visiting Traditional Chinese Medicine (TCM)
         | practitioner for a week now and my symptoms are indeed
         | reducing. But TCM paradigm and concepts are so different from
         | western medicine paradigms and concepts that I can't understand
         | the doctor's explanation at all. Again, Claude does a
         | reasonable job of explaining to me what's going on or why it
         | works from a western medicine point of view.
         | 
         | Want to write a novel? Brainstorm ideas with GPT-4o.
         | 
         | I had a debate with a friend's child over the correct spelling
         | of a Dutch word ("instabiel" vs "onstabiel"). Google results
         | were not very clear. ChatGPT explained it clearly.
         | 
         | Just where is this "useless" idea coming from? Do people not
         | have a life outside of coding?
        
           | krapp wrote:
           | Yes people have lives outside of coding, but most people are
           | able to manage without having AI software intercede in as
           | much of their lives as possible.
           | 
           | It seems like you trust AI more than people and prefer it to
           | direct human interaction. That seems to be satisfying a need
           | for you that most people don't have.
        
             | claar wrote:
             | Why do you postulate that "most people don't have" this
             | need? I also use AI non-stop throughout my day for similar
             | uses.
             | 
             | This feels identical to when I was an early "smart phone"
             | user w/my palm pilot. People would condescend saying they
             | didn't understand why I was "on it all the time". A decade
             | or two later, I'm the one trying to get others to put down
             | their phones during meetings.
             | 
             | My take? Those who aren't using AI continually currently
             | are simply later adopters of AI. Give it a few years - or
             | at most a decade - and the idea of NOT asking 100+ AI
             | queries per day (or per hour) will seem positively quaint.
        
               | krapp wrote:
               | >Those who aren't using AI continually currently are
               | simply later adopters of AI. Give it a few years - or at
               | most a decade - and the idea of NOT asking 100+ AI
               | queries per day (or per hour) will seem positively
               | quaint.
               | 
               | I don't think you're wrong, I just think a future in
               | which it's all but physically and socially impossible to
               | have a single thought or communication not mediated by
               | software is fucking terrifying.
        
             | FooBarWidget wrote:
             | When I'm done working, chased my children to properly
             | finish their dinner, helped my son with homework, and
             | putting them to bed, it's already 9+ PM -- the only time of
             | the day when I have free time. Just which human besides my
             | wife can I talk to at that point? What if she doesn't have
             | a clue either? All the professionals are only open when I'm
             | working. A lot of the issues happen during the weekend,
             | when professionals are closed. I don't want to disturb
             | friends during the evening, and it's not like they have the
             | expertise I need anyway.
             | 
             | LLMs are infinitely patient, don't think I am dumb for
             | asking certain things, consider all the information I feed
             | them, are available whenever I need them, have a wide range
             | of expertise, and are dirt cheap compared to professionals.
             | 
             | That they might hallucinate is not a blocker most of the
             | time. If the information I require is critical, I can
             | always double check with my own research or with
             | professionals (in which case the LLM has already primed me
             | with a basic mental model so that I can ask quick, short,
             | targeted questions, which saves the both of us time, and me
             | money). For everything else (such as my curiocity on why
             | TCM works, or the correct spelling of a word), LLMs are
             | good enough.
        
           | jiggawatts wrote:
           | At the risk of sounding impolite or critical of your personal
           | choices: this, right here, is the problem!
           | 
           | You don't understand how medicine works, at any level.
           | 
           | Yet you turn to a machine for advice, _and take it at face
           | value_.
           | 
           | I say these things confidently, because I _do_ understand
           | medicine well enough to _not_ to seek my own answers.
           | Recently I went to a doctor for a serious condition and
           | _every_ notion I had was wrong. Provably wrong!
           | 
           | I see the same behaviour in junior developers that simply
           | copy-paste in whatever they see in StackOverflow or whatever
           | they got out of ChatGPT with a terrible prompt, no context,
           | and _no understanding_ on their part of the suitability of
           | the answer.
           | 
           | This is why I and many others still consider AIs mostly
           | useless. The human in the loop is still _the_ critical
           | element. Replace the human with someone that thinks that
           | powdered rhino horn will give them erections, and the utility
           | of the AI drops to near zero. Worse, it can _multiply_ bad
           | tendencies and bad ideas.
           | 
           | I'm sure someone somewhere is asking DeepSeek how best to get
           | endangered animals parts on the black market.
        
             | FooBarWidget wrote:
             | No. Where do you read that I take it at face value? I
             | literally said that I expect Claude to give me "half-assed"
             | medical guidance. I merely said that that is still better
             | than having _no_ clue for the next 2 hours while I wait on
             | the phone with 35 people in front of me, which is
             | _completely different_ from  "taking medicine advice at
             | face value".
             | 
             | So I am curious about how TCM works. So what if an LLM
             | hallucinates there? I am not writing papers on TCM or
             | advising governments on TCM policy. I still follow the
             | doctor's instructions at the end of the day.
             | 
             | For anything really critical I already double check with
             | professionals.
             | 
             | You are letting perfect be the enemy of good. A half-assed
             | tax advice with some hallucinations from an LLM is still
             | useful, because it will prime me with some basic knowledge.
             | When I later double check the whole thing with a
             | professional, I will already know what questions to ask and
             | what direction I need to explore, which saves time and
             | money compared to goinf in with a blank slate.
             | 
             | There is absolutely nothing wrong with using LLMs when you
             | know their limits and how to mitigate them.
             | 
             | So what if every notion you learned about medicine from
             | LLMs is wrong? You learn why they're wrong, then next time
             | you prompt/double check better, until you learn how to use
             | it for that field in the least hallucinationatory way. Your
             | experience also doesn't match mine: the advice I get
             | usually contains useful elements that I discuss with
             | doctors. Plus, doctors can make mistakes too.
             | 
             | Stop letting perfect be the enemy of good.
        
         | karmakaze wrote:
         | We're at the "computers play chess badly" stage. Then we'll hit
         | the Deep Thought (1988) and Deep Blue (1995-1997) stages, but
         | still saying that solving Go won't happen for 50+ years and
         | that humans will continue to be better than computers.
         | 
         | The date/time that divides my world into before/after is
         | AlphaGo v Lee Sedol game 3 (2016). From that time forward, I
         | don't dismiss out of hand speculations of how soon we can have
         | intelligent machines. Ray Kurzweil date of 2045 is as good as
         | any (and better than most) for an estimate. Like Moore's (and
         | related) Laws, it's not about _how_ but the historical pace of
         | advancements crossing a fairly static point of human
         | capability.
         | 
         | Application coding, requires much less intelligence than
         | playing Go at these high levels. The main differences are
         | concise representation and clear final outcome scoring. LLMs
         | deal quite well with the fuzziness of human communications.
         | There may be a few more pegs to place but _when_ seems
         | predictably unknown.
        
         | kromem wrote:
         | Both new Sonnet and Haiku have a masking overhead.
         | 
         | Using a few messages to get them out of "I aim to be direct" AI
         | assistant mode gets much better overall results for the rest of
         | the chat.
         | 
         | Haiku is actually incredibly good at high level systems
         | thinking. Somehow when they moved to a smaller model the
         | "human-like" parts fell away but the logical parts remained at
         | a similar level.
         | 
         | Like if you were taking meeting notes from a business strategy
         | meeting and wanted insights, use Haiku over Sonnet, and thank
         | me later.
        
       | nntwozz wrote:
       | I think John Gruber summed it up nicely:
       | 
       | https://daringfireball.net/2024/12/openai_unimaginable
       | 
       | OpenAI's board now stating "We once again need to raise more
       | capital than we'd imagined" less than three months after raising
       | another $6.6 billion at a valuation of $157 billion sounds
       | alarmingly like a Ponzi scheme -- an argument akin to "Trust us,
       | we can maintain our lead, and all it will take is a never-ending
       | stream of infinite investment."
        
         | hdjjhhvvhga wrote:
         | What is funny is that their "lead" is just because of inertia -
         | they were the first to make an LLM publicly available. But they
         | are no longer leaders so their attempts at getting more and
         | more money only prove Altman's skills at convincing people to
         | give him money.
        
           | jppope wrote:
           | yeah but in business there are really only 2 skills right?
           | Convincing people to give you money and giving them something
           | back to them thats worth more than the money they gave you.
        
             | klipt wrote:
             | For repeated business you want to give them something that
             | costs you less than what they pay, but is worth more to
             | them than what they pay. Ie creating economic value.
        
           | lumost wrote:
           | They are still in the lead, and I'd be willing to bet that
           | they have 10x the DAU on chat.com/chatgpt.com than all other
           | providers combined. Barring massive innovation on small sub
           | 10B models - we are all likely to need remote inference from
           | large server farms for the foreseeable future. Even in the
           | case that local inference is possible - it's unlikely it will
           | be desirable from a power perspective in the next 3 years. I
           | am not going to buy a 4xB200 instance for myself.
           | 
           | Whether they offer the best model or not may not matter if
           | you need a PhD in <subject> to differentiate the response
           | quality between LLMs.
        
             | scary-size wrote:
             | Not sure about 10x DAUs. Google flicked the switch on
             | Gemini and it surfaced in pretty much every GSuite app over
             | night.
        
               | Peacefulz wrote:
               | Requiring that Gemini take over the job that Google
               | Assistant did when installing the Gemini APK really
               | rubbed me the wrong way. I get it. I just don't like that
               | it was required for use.
        
               | brokencode wrote:
               | Same with Microsoft and all their Copilots, which are
               | built on OpenAI. Not to mention all the other companies
               | using OpenAI since it's still the best.
        
           | theferalrobot wrote:
           | Which models perform better than 4o or o1 for your use cases?
           | 
           | In my limited tests (primarily code) nothing from llama or
           | Gemini have come close, Claude I'm not so sure about.
        
             | torginus wrote:
             | How good is the best model of your choice at doing
             | architecture work for complex and nontrivial apps?
             | 
             | I have been bashing my head against the wall over the
             | course of the past few days trying to create my (quite
             | complex) dream app.
             | 
             | Most of LLM coding I've done involved in writing code to
             | interface with already existing libs or services and the
             | LLMs are great at that.
             | 
             | I'm hung up on architecture questions that are unique to my
             | app and definitely not something you can google.
        
               | fullstackchris wrote:
               | Don't wanna be that typical hackernews guy but I couldnt
               | resist... if your app is "quite complex" there is
               | probably a way or ways you can break it down into much
               | simpler parts. Easier for you AND the LLM. It always
               | comes back to architecture and composition ;)
        
               | torginus wrote:
               | I don't want to be mean, but that bit of eastern wisdom
               | you dispensed sounds incredibly like what a management
               | consultant would say.
        
           | belter wrote:
           | Their best hope now is to hire John Carmack :-)
        
         | jsheard wrote:
         | According to the internal projections that The Information
         | acquired recently they're expecting to lose $14 billion in
         | 2026, so that record breaking funding round won't even buy them
         | 6 months of runway at that point even by their own probably
         | optimistic estimates.
        
         | cactusfrog wrote:
         | Every waste of money is not a Ponzi scheme.
        
           | ffsm8 wrote:
           | I agree, the core aspect of a ponzi scheme is that it
           | redistributes the newly invested funds to previous investors,
           | making it highly profitable to anyone joining early and
           | incentivising early joiners to get new investors.
           | 
           | This just doesn't hold true for open ai
        
             | jacobgkau wrote:
             | Doesn't it hold true for investment in AI (or potentially
             | any other industry that experiences a boom) in general?
             | 
             | Anyone who bought in at the ground floor is now rich.
             | Anyone who buys in now is incentivized to try and keep
             | getting more people to buy in so their investment will give
             | a return regardless of if actual value is being created.
        
               | dartos wrote:
               | If effect, kind of.
               | 
               | The money being invested does not go directly to
               | investors.
               | 
               | It goes to the cost of R&D, which in turn increases the
               | value of openai shares, then the early investors can sell
               | those shares to realize those gains.
               | 
               | The difference between that and a ponzi is that the
               | investment creates value which is reflected in the share
               | price.
               | 
               | No value is created in a Ponzi scheme.
               | 
               | The actual dollar worth of the value generated is what
               | people speculate on.
        
               | zekica wrote:
               | Only a part of the value is created in OpenAI's stock
               | valuation. Most of it is still a ponzi-like scheme.
        
               | dartos wrote:
               | I have no love for openai, but they did make the fastest
               | growing product of all time. There's value in being the
               | ones to do that.
               | 
               | I do agree it's a very very thin line.
        
           | wslh wrote:
           | Not every, but wasting money is one of the tricks of
           | corruption.
        
           | DavidSJ wrote:
           | > Every waste of money is not a Ponzi scheme.
           | 
           | Using this as an opportunity to grind an axe (not your fault,
           | cactusfrog!): I find it clearer when people write "not every
           | X is a Y" than "every X is not a Y", which could be (and
           | would be, literally) interpreted to mean the same thing as
           | "no X is a Y".
        
       | sowbug wrote:
       | Simon has mentioned in multiple articles how cool it is to use
       | 64GB DRAM for GPU tasks on his MacBook. I agree it's cool, but I
       | don't understand why it is remarkable. Is Apple doing something
       | special with DRAM that other hardware manufacturers haven't
       | figured out? Assuming data centers are hoovering up nearly all
       | the world's RAM manufacturing capacity, how is Apple still
       | managing to ship machines with DRAM that performs close enough
       | for Simon's needs to VRAM? Is this just a temporary blip, and PC
       | manufacturers in 2025 will be catching up and shipping mini PCs
       | that have 64GB RAM ceilings with similar memory performance? What
       | gives?
        
         | post-it wrote:
         | Apple designs its own chips, so the RAM and CPU are on the same
         | die and can talk at very high speeds. This is not the case for
         | PCs, where RAM is connected externally.
        
         | com2kid wrote:
         | Apple uses HBM, basically RAM on the same die as the CPU. It
         | has a lot more memory bandwidth than typically PC dram, but
         | still less than many GPUs. (Although the highest end macs have
         | bandwidth that is in the same ballpark as GPUs)
        
           | jsheard wrote:
           | Apple does not use HBM, they use LPDDR. The way they use it
           | is similar in principle to HBM (on-package, very wide bus)
           | but it's not the same thing.
        
             | karmakaze wrote:
             | Right so Apple uses high-bandwidth memory, but not HBM.
        
           | justincormack wrote:
           | Its not HBM, which GPUs tend to use, but it is on package and
           | wider interface than other PCs
        
         | minimaxir wrote:
         | LLMs run on the GPU, and the unified memory of Apple silicon
         | means that the 64 GB can be used by the GPU.
         | 
         | Consumer GPUs top out at 24 GB VRAM.
        
           | karolist wrote:
           | llama.cpp can run LLMs on CPU. iGPU can also use system
           | memory, the novel thing is not that, it's that the LLM
           | inference is mostly memory bandwidth bound and memory
           | bandwidth of a custom built PC with really fast DDR5 RAM is
           | around 100GB/s, nVidia consumer GPUs at the top end are
           | around 1TB/s, with mid range GPUs at around half that. M1 Max
           | has 400GB/s, M1 Ultra is 800GB/s, but you can have Apple
           | Silicon Macs with up to 192GB of 800GB/s memory usable by
           | GPU, this means much faster inference than just CPU+system
           | memory due to bandwidth and more affordable than building a
           | multi-GPU system to match the memory amount.
        
             | dekhn wrote:
             | It'd be really nice to have good memory bandwidth usage
             | metrics collected from a wide range of devices while doing
             | inference.
             | 
             | For example, how close does it get to the peak, and what's
             | the median bandwidth during inference? And is that
             | bandwidth, rather than some other clever optimization
             | elsewhere, actually providing the Mac's performance?
             | 
             | Personally, I don't develop HPC stuff on a laptop - I am
             | much more interested in what a modern PC with Intel or AMD
             | and nvidia can do, when maxxed out. But it's certainly
             | interesting to see that some of Apple's arch decisions have
             | worked out well for local LLMs.
        
       | viccis wrote:
       | I didn't realize "agent" designs were that ambiguously defined.
       | Every AI engineer I've talked to uses it to mean a design that
       | combines several separate LLM prompts (or even models) to solve
       | problems in multiple stages.
        
         | simonw wrote:
         | I'll add that one to the list. Surprisingly it doesn't closely
         | match most of the 211 definitions I've collected already!
         | 
         | The closest in that collection is "A division of
         | responsibilities between LLMs that results in some sort of
         | flow?" -
         | https://lite.datasette.io/?json=https://gist.github.com/simo...
        
         | xnx wrote:
         | This sounds like ensemble chain of thought.
        
         | datadrivenangel wrote:
         | If the investors ask, those same AI engineers will probably
         | allow the answer to be much more ambiguous.
        
       | submeta wrote:
       | Thank you Simon for the excellent work you do! I learned a lot
       | from you and enjoy reading everything you write. Keep up. And
       | happy new year.
        
       | xnx wrote:
       | Double checking, I don't think I saw anything about video
       | generation. Not sure if those fall under the "LLM" umbrella. It
       | came very late in the year, but the Google Veo 2 limited testing
       | are astounding. There are at least a half-dozen other services
       | where you can pay to generate video.
        
         | baobabKoodaa wrote:
         | Video generation was covered in OP
        
       | xnx wrote:
       | I've been surprised that ChatGPT has hung on as long as it has.
       | Maybe 2025 is the year Microsoft pushes harder for their brand of
       | LLM.
        
       | switch007 wrote:
       | I've watched juniors take their output as gospel applying
       | absolutely zero thinking and getting confused when I suggest
       | looking at the reference manual instead
       | 
       | I've had PMs believe it can replace all writing of tickets and
       | thinking about the feature, creating completely incomprehensible
       | descriptions and acceptance criteria
       | 
       | I've had Slack messages and emails from people with zero
       | sincerity and classic LLM style and the bs that entails
       | 
       | I've had them totally confidently reply with absolute nonsense
       | about many technical topics
       | 
       | I'm grouchy and already over LLMs
        
       | dartos wrote:
       | > There's a flipside to this too: a lot of better informed people
       | have sworn off LLMs entirely because they can't see how anyone
       | could benefit from a tool with so many flaws. The key skill in
       | getting the most out of LLMs is learning to work with tech that
       | is both inherently unreliable and incredibly powerful at the same
       | time. This is a decidedly non-obvious skill to acquire!
       | 
       | I wish the author qualified this more. How does one develop that
       | skill?
       | 
       | What makes LLMs so powerful on a day to day basis without a large
       | RAG system around it?
       | 
       | Personally, I try LLMs every now and then, but haven't seen any
       | indication of their usefulness for my day to day outside of being
       | a smarter auto complete.
        
         | lumost wrote:
         | When I started my career in 2010, google was a semi-serious
         | skill. All of the little things that we know how to do now such
         | as ignoring certain sites, lingering on others, and iteratively
         | refining our search queries were not universally known at the
         | time. Experienced engineers often relied on encyclopedic
         | knowledge of their environment or by "reading the manual".
         | 
         | In my experience, LLM tools are the same, you ask for something
         | basic initially and then iteratively refine the query either
         | via dialog or a new prompt until you get what you are looking
         | for or hit the end of the LLM's capability. Knowing when you've
         | reached the latter is critically important.
        
           | o11c wrote:
           | The problems with that skill is that:
           | 
           | * Most existing LLM interfaces are very bad at _editing_
           | history, instead focusing entirely on _appending_ to history.
           | You can sort of ignore this for one-shot, and this can be
           | properly fixed with additional custom tools, but ...
           | 
           | * By the time you refine your input enough to patch over all
           | the errors in the LLM's output for your sensible input,
           | you're bigger than the LLM can actually handle (much smaller
           | than the alleged context window), so it starts randomly
           | ignoring significant chunks of what you wrote (unlike
           | context-window problems, the ignored parts can be _anywhere_
           | in the input).
        
             | simonw wrote:
             | Yeah, a key thing to understand about LLMs is that managing
             | the context is _everything_. You need to know when to wipe
             | the slate by starting a new chat session and then pasting
             | across a subset of the previous conversation.
             | 
             | A lot of my most complex LLM interactions take place across
             | multiple sessions - and in some cases I'll even move the
             | project from Claude 3.5 Sonnet to OpenAI o1 (or vice versa)
             | to help get out of a rut.
             | 
             | It's infuriatingly difficult to explain why I decide to do
             | that though!
        
               | grimgrin wrote:
               | I bought in early to typingmind, a great web based
               | frontend. Good for editing context, and switching from
               | say gemini to claude. This is a very normal flow for me,
               | and whatever tool you use should enable this
               | 
               | also nice to interact with an LLM in vim, as the context
               | is the buffer
               | 
               | obviously simon's llm tool rules. I've wrapped it for vim
        
               | dartos wrote:
               | What kinds of things do you with these LLMs?
               | 
               | I feel like I'm good at understanding context. I've been
               | working in AI startups over the last 2 years. Currently
               | at an AI search startup.
               | 
               | Managing context for info retrieval is the name of the
               | game.
               | 
               | But for my personal use as a developer, they've caused me
               | much headache.
               | 
               | Answers that are subtly wrong in such a way that it took
               | me a week to realize my initial assumption based on the
               | LLM response was totally bunk.
               | 
               | This happened twice. With the yjs library, it gave me
               | half incorrect information that led me to misimplementing
               | the sync protocol. Granted it's a fairly new library.
               | 
               | And again with the web history api. It said that the
               | history stack only exists until a page reload. The
               | examples it gave me ran as it described, but that isn't
               | how the history api works.
               | 
               | I lost a week of time because of that assumption.
               | 
               | I've been hesitant to dive back in since then. I ask
               | questions every now and again, but I jump off much faster
               | now if I even think it may be wrong.
        
         | perrygeo wrote:
         | There's a similar dynamic in building reliable distributed
         | systems on top of an unreliable network. The parts are prone to
         | failure but the system can keep on working.
         | 
         | The tricky problem with LLMs is identifying failures - if
         | you're asking the question, it's implied that you don't have
         | enough context to assess whether it's a hallucination or a good
         | recommendation! One approach is to build ensembles of agents
         | that can check each other's work, but that's a resource-
         | intensive solution.
        
         | volgminar wrote:
         | The quote you pulled reeks of desperation.
         | 
         | If you have a product then sell me on it, because you'll only
         | sound desperate trying to convince me to use your product by
         | telling me why I have an aversion to it.
         | 
         | Does simonw actually like probabilistic computing? Does simonw
         | eagerly follow Dave Ackley's T2Tile project? [
         | https://t2tile.com ]
         | 
         | Or is simonw just using "unreliable" computing in an attempt at
         | a holier than thou framing to talk, yet again, about a subset
         | of a subset of machine learning research?
        
           | simonw wrote:
           | I hadn't heard of T2Tile. The intro video
           | https://www.youtube.com/watch?v=jreRFxN6wuM is from 5 years
           | ago so it predates even GPT-3.
           | 
           | Do you know if any of the ideas from that project have
           | crossed over into LLM world yet?
        
           | duck wrote:
           | Do you know who Simon is?
        
         | simonw wrote:
         | One of the things I find most frustrating about LLMs is how
         | resistant they are to teaching other people how to use them!
         | 
         | I'd love to figure this out. I've written more about them than
         | most people at this point, and my goal has always been to help
         | people learn what they can and cannot do - but distilling that
         | down to a concise set of lessons continues to defeat me.
         | 
         | The only way to really get to grips with them is to use them, a
         | lot. You need to try things that fail, and other things that
         | work, and build up an intuition about their strengths and
         | weaknesses.
         | 
         | The problem with intuition is it's really hard to download that
         | into someone else's head.
         | 
         | I share a _ton_ of chat conversations to show how I use them -
         | https://simonwillison.net/tags/tools/ and
         | https://simonwillison.net/tags/ai-assisted-programming/ have a
         | bunch of links to my exported Claude transcripts.
        
           | bjt wrote:
           | Thank you for doing this work, though.
           | 
           | My first stab at trying ChatGPT last year was asking it to
           | write some Rust code to do audio processing. It was not a
           | happy experience. I stepped back and didn't play with LLMs at
           | all for a while after that. Reading your posts has helped me
           | keep tabs on the state of the art and decide to jump back in
           | (though with different/easier problems this time).
        
         | swalsh wrote:
         | It's amazing this is still an opinion in 2025. I now ask devs
         | how they use AI as part of their workflows when I interview.
         | It's a standard skill I expect my guys to have.
        
           | BeetleB wrote:
           | Just curious, but what AI related skills do you expect them
           | to have?
        
           | dartos wrote:
           | I feel bad for your team.
           | 
           | Let people work how they want. I wouldn't not hire someone on
           | the basis of them not using a language server.
           | 
           | The creator of the Odin language famously doesn't use one.
           | He's says that he, specifically, is faster without one.
        
         | BeetleB wrote:
         | I think most tech folks struggle with it because they treat
         | LLMs as computer programs, and their experience is that SW
         | should be extremely reliable - imagine using a calculator that
         | was wrong 5% of the time - no one would accept that!
         | 
         | Instead, think of an LLM as the equivalent of giving a human a
         | menial task. You _know_ that they 're not 100% reliable, and so
         | you give them only tasks that you can quickly verify and
         | correct.
         | 
         | Abstract that out a bit further, and realize that most managers
         | don't expect their reports to be 100% reliable.
         | 
         | Don't use LLMs where accuracy is paramount. Use it to automate
         | away tedious stuff. Examples for me:
         | 
         | Cleaning up speech recognition. I use a traditional voice
         | recognition tool to transcribe, and then have GPT clean it up.
         | I've tried voice recognition tools for dictation on and off for
         | over a decade, and always gave up because even a 95% accuracy
         | is a pain to clean up. But now, I route the output to GPT
         | automatically. It still has issues, but I now often go
         | paragraphs before I have to correct anything. For personal
         | notes, I mostly don't even bother checking its accuracy - I do
         | it only when dictating things others will look at.
         | 
         | And then add embellishments to that. I was dictating out a
         | recipe I needed to send to someone. I told GPT up front to
         | write any number that appears next to an ingredient as a
         | numeral (i.e. 3 instead of "three"). Did a great job - didn't
         | need to correct anything.
         | 
         | And then there are always the "I could do this myself but I
         | didn't have time so I gave it to GPT" category. I was giving a
         | presentation that involved graphs (nodes, edges, etc). I was on
         | a tight deadline and didn't want to figure out how to draw
         | graphs. So I made a tabular representation of my graph, gave it
         | to GPT, and asked it to write graphviz code to make that graph.
         | It did it perfectly (correct nodes and edges, too!)
         | 
         | Sure, if I had time, I'd go learn graphviz myself. But I
         | wouldn't have. The chances I'll need graphviz again in the next
         | few years is virtually 0.
         | 
         | I've actually used LLMs to do quick reformatting of data a few
         | times. You just have to be careful that you can verify the
         | output _quickly_. If it 's a long table, then don't use LLMs
         | for this.
         | 
         | Another example: I have a custom note taking tool. It's just
         | for me. For convenience, I also made an HTML export. Wouldn't
         | it be great if it automatically made alt text for each image I
         | have in my notes? I would just need to send it to the LLM and
         | get the text. It's fractions of a cent per image! The current
         | services are a lot more accurate at image recognition than I
         | need them to be for this purpose!
         | 
         | Oh, and then of course, having it write Bash scripts and CSS
         | for me :-) (not a frontend developer - I've learned CSS in the
         | past, but it's quicker to verify whatever it throws at me than
         | Google it).
         | 
         | Any time you have a task and lament "Oh, this is likely easy,
         | but I just don't have the time" consider how you could make an
         | LLM do it.
        
       | alexashka wrote:
       | I wonder what the author of this post thinks of human generated
       | slop.
       | 
       | For example if someone just takes random information about a
       | topic, organizes it in chronological order and adds empty
       | opinions and preferences to it and does that for years on end -
       | what do you call that?
        
       | dash2 wrote:
       | Look, when are these models going to not just talk to me, but do
       | stuff for me? If they're so clever, why can't I tell one to buy
       | chocolates and send them to my wife? Meanwhile, they can
       | allegedly solve frontier maths problems. What's the holdup to
       | models that go online and perform simple tasks?
        
         | icelancer wrote:
         | The last mile problem remains undefeated.
        
         | gs17 wrote:
         | > why can't I tell one to buy chocolates and send them to my
         | wife?
         | 
         | I'm pretty sure that's been possible for a while. There was an
         | example where Claude's computer use feature ordered pizza for
         | the dev team through DoorDash:
         | https://x.com/alexalbert__/status/1848777260503077146?lang=e...
         | 
         | I don't think the released version of the feature can do it,
         | but it should be possible with today's tech.
        
       | th0ma5 wrote:
       | More dishonest magical thinking. I wish this guy would learn how
       | systems work and stop flooding the field with mystical nonsense
       | unless he really is trying to make people think LLMs are
       | worthless, then I guess he should be honest about it instead of
       | subversive.
        
         | simonw wrote:
         | Which bit was dishonest magical thinking?
         | 
         | In case you're interested, here's a summarized list (thanks,
         | Claude) of the negative/critical things I said about LLMs and
         | the companies that build them in this post:
         | https://gist.github.com/simonw/73f47184879de4c39469fe38dbf35...
        
         | jdlshore wrote:
         | I read the article and thought it was well done and level-
         | headed. What exactly did you think was mystical or magical
         | thinking?
        
       | orsenthil wrote:
       | One of the best written summary of LLMs for the year 2024.
       | 
       | We all have silently started to realize Slops, hopefully we can
       | recognize them more easily and prevent them.
       | 
       | Test Driven Development (Integration Tests or functional tests
       | specifically) for Prompt Driven Development seems like the way to
       | go.
       | 
       | Thank you, Simon.
        
       | legendofbrando wrote:
       | @simonw you've been awesome all year; loved this recap and look
       | forward to more next year
        
       | Havoc wrote:
       | Great summary of highlights. Don't agree with all, but I think
       | it's a very sound attempt at a year in review summary
       | 
       | >LLM prices crashed
       | 
       | This one has me a little spooked. The white knight on this front
       | (DS) has both announced increases and has had staff poached.
       | There is still Gemini free tier which is ofc basically impossible
       | to beat (solid & functionally unlimited/free) but it's google so
       | reluctant to trust.
       | 
       | Seriously worried about seeing a regression on pricing in first
       | half of 2025. Especially with the OAI $200 price anchoring.
       | 
       | >"Agents" still haven't really happened yet
       | 
       | Think that's largely because it's a poorly defined concept and
       | true "agent" implies some sort of pseudo-agi autonomy. This is a
       | definition/expectation issue rather than technical in my mind
       | 
       | >LLMs somehow got even harder to use
       | 
       | I don't think that's 100%. An explosion of options is not equal
       | to harder to use. And the guidance for noobs is still pretty much
       | same as always (llama.cp or one of the common frontends like
       | text-generation-webui). It's become harder to tell what is good,
       | but not to get going.
       | 
       | ----
       | 
       | One key theme I think is missing is just how hard it has become
       | to tell what is "good" for the average user. There is so much
       | benchmark shenanigans going on that it's just impossible to tell.
       | I'm literally at the "I'm just going to build my own testing
       | framework" stage. Not because I can do better technically (I
       | can't)...but because I can gear it towards things I care about
       | and I can be confident my DIY sample hasn't been gamed.
        
         | simonw wrote:
         | The biggest reason I'm not worried about prices going back up
         | again is Llama. The Llama 3 models are _really good_ , and
         | because they are open weight there are a growing number of API
         | providers competing to provide access to them.
         | 
         | These companies are incentivized to figure out fast and
         | efficient hosting for the models. They don't need to train any
         | models themselves, their value is added entirely in continuing
         | to drive the price of inference down.
         | 
         | Groq and Cerberus are particularly interesting here because WOW
         | they serve Llama fast.
        
       | macawfish wrote:
       | Large concept models are really exciting
        
       | voidhorse wrote:
       | Great write up! Unfortunately, I think this article accurately
       | reflects how we've made little progress on the most important
       | aspects of LLM hype and use: the social ones.
       | 
       | A small number of people with lots of power are essentially
       | deciding to go all in on this technology presumably because
       | significant gains will mean the long term reduction of human
       | labor needs, and thus human labor power. As the article mentions,
       | this also comes at huge expenditure and environmental impact,
       | which is already a very important domain in crisis that we've
       | neglected. The whole thing especially becomes laughable when you
       | consider that many people are still using these tools to perform
       | tasks that could be preformed with a margin of more effort using
       | existing deterministic tools. Instead we are now opting for a
       | computationally more expensive solution that has a higher margin
       | of error.
       | 
       | I get that making technical progress in this area is interesting,
       | but I really think the lower level workers and researchers
       | exploring the space need to be more emphatic about thinking about
       | socioeconomic impact. Some will argue that this is analogous to
       | any other technological change and markets will adjust to account
       | for new tool use, but I am not so sure about this one. If the
       | technology is really as groundbreaking as everyone wants us to
       | believe then logically we might be facing a situation that isn't
       | as easy to adapt to, and I guarantee those with power will not
       | "give a little back" to the disenfranchised masses out of the
       | goodness of their hearts.
       | 
       | This doesn't even raise all the problems these tools create when
       | it comes to establishing coherent viewpoints and truth in
       | ostensibly democratic societies, which is another massive can of
       | worms.
        
       | bwhiting2356 wrote:
       | Some amount of LLM gullibility may be needed. Let's say I have a
       | RAG use case for internal documents about how my business works.
       | I need the LLM to accept what I'm telling it about my business as
       | the truth without questioning it. If I got responses like "this
       | return policy is not correct", LLMs would fail at my use case.
        
       | calebm wrote:
       | I love your breadth-first approach of having an outline at the
       | top.
        
         | simonw wrote:
         | I wrote custom software for that!
         | https://tools.simonwillison.net/render-markdown - If you paste
         | in some Markdown with ## section headings in it the output will
         | start with a <ul> list of links to those headings.
        
       | nektro wrote:
       | i learned this industry has lower morals and standards for
       | excellence than i ever previously expected
        
       ___________________________________________________________________
       (page generated 2024-12-31 23:00 UTC)