[HN Gopher] Things we learned about LLMs in 2024
___________________________________________________________________
Things we learned about LLMs in 2024
Author : simonw
Score : 352 points
Date : 2024-12-31 18:11 UTC (4 hours ago)
(HTM) web link (simonwillison.net)
(TXT) w3m dump (simonwillison.net)
| agentultra wrote:
| Don't forget that 2024 was also a record year for new methane
| power plant projects. Some 200 new projects in the US alone and
| I'd wager most of them are funded directly by big tech for AI
| data centres.
|
| https://www.bnnbloomberg.ca/investing/2024/09/16/ai-boom-is-...
|
| This is definitely extending the runway of O&G at a crisis point
| in the climate disaster when we're supposed to be reducing and
| shutting down these power plants.
|
| _Update_ : clarified the 200 number is in the US. There are far
| more world wide.
| api wrote:
| The only thing that will stop this is for battery storage to
| get cheap and available enough that it can cover for
| renewables. If we are still building gas turbines it means that
| hasn't happened yet.
|
| AI is a red herring. If it wasn't that it would be EV power
| demand. If it wasn't that it would be reshoring of
| manufacturing. If it wasn't that it would be population growth
| from immigration. If it wasn't that it would be replacing old
| coal power plants reaching EOL.
|
| Replacing coal with gas is an improvement by the way. It's
| around half the CO2 per kWh, sometimes less if you factor in
| that gas turbines are often more efficient than aging old coal
| plants.
| agentultra wrote:
| Methane has a shorter half-life than CO2 but is a far worse
| green house gas; retaining far more heat.
|
| And delivering methane leaks like a sieve into the atmosphere
| from all parts of the process.
|
| Sure it's probably "better than coal," but not by much. It's
| a bit like comparing what's worse: getting burned by fire or
| being drowned in acid.
| Nition wrote:
| Pumped hydro is an excellent form of storage if you have the
| terrain for it. A whole order of magnitude cheaper than
| battery storage at the moment.
| ToucanLoucan wrote:
| It would be really cool if big tech could find a new
| hyperscaler model that didn't _also_ require offsetting the
| goals of green energy projects worldwide. Between LLM and
| crypto you 'd swear they're trying to find the most energy-
| wasteful tech possible.
| zachrip wrote:
| It seems odd to put crypto and LLMs in the same boat in this
| regard - I might be wrong but are there any crypto projects
| that actually provide value? I'm sure there are ones that do
| folding or something but among the big ones?
| rileymat2 wrote:
| Value is a hard term, this link will seem snarky, but:
| https://www.axios.com/2024/12/25/russia-bitcoin-evade-
| sancti...
|
| So in a way, it is providing value to someone, whether we
| like it or not.
|
| Or Drug Cartels. https://www.context.news/digital-
| rights/how-crypto-helps-lat...
|
| But this is the promise of uncontrollable decentralization
| providing value, for good or bad?
| ben_w wrote:
| Cryptocurrency, at least PoW, the point is indeed to be the
| most wasteful -- a literal Dyson swarm powered Bitcoin would
| provide _exactly the same utility_ as the BTC network already
| had in 2010.
|
| LLMs (and the image, sound, and movie generating models) are
| more _coincidentally_ power-hogs -- people are at least
| trying to make them better at fixed compute, and lower
| compute at fixed quality.
| ToucanLoucan wrote:
| I mean, I appreciate that distinction and don't disagree.
| _And,_ if this is going to continue being a trend, I think
| we need more stringent restrictions on what sorts of
| resources are permitted to be consumed in the power plants
| that are constructed to meet the needs of hyperscaler data
| centers.
|
| Because whether we're using tons of compute to provide
| value or not doesn't change that _we are using tons of
| compute_ and tons of compute requires tons of energy, both
| for the chips themselves, and the extensive infrastructure
| that has to built around them to let them work. And not
| just electricity: refrigerants, many of which are
| environmentally questionable themselves, are a big part;
| hell, just water. Clean, usable water.
|
| If we truly need these data centers, then fine. Then they
| should be powered by renewable energy, or if they
| absolutely cannot be, then the costs their nonrenewable
| energy sources inflict on the biosphere should be priced
| into their construction and use, and in turn, priced into
| the tech that is apparently so critical for them to have.
|
| This is like, a basic calculus that every grown person
| makes dozens of times a day: do I need this? And they don't
| get to distribute the cost of that need, however prescient
| it may be, on their wider community because they can't
| afford it otherwise. I don't see why Microsoft should be
| able to either. If this is truly the tech of the future as
| it is constantly propped up to be, cool. Then charge a
| price for it that reflects what it costs to use.
| comte7092 wrote:
| Energy generation methods aren't fungible.
|
| Methane is favored in many cases because they can be quickly
| ramped up and down to handle momentary peaks in demand or
| spotty supply generated from renewables.
|
| Without knowing more details about those projects it is
| difficult to make the claim that these plants have anything to
| do with increased demand due to LLMs, though if anything,
| they'd just add to base load demands and lead to slower
| decommissioning of old coal plants like we've seen with bitcoin
| mines.
| throwup238 wrote:
| Methane is also worth burning to lessen the GHG impact since
| we produce so much of it as a byproduct of both resource
| extraction and waste disposal anyway.
| uludag wrote:
| But according to the author, apparently bringing this up isn't
| helpful criticism.
|
| I'm curious what peoples thoughts are of what the future of
| LLMs would be like if we severely overshoot our carbon goals.
| How bad would thinks have to get for people to stop caring
| about this technology?
| simonw wrote:
| It's helpful criticism as _part_ of the conversation. What
| frustrates me is when people go "LLMs are burning the
| planet!" and leave it at that.
| agentultra wrote:
| It is a rather contrasting opinion that the trade-offs to
| have AI aren't worth the value they bring.
|
| The growth in this technology isn't outpacing car pollution
| and O&G extraction... yet, but the growth rate has been
| enough in recent years to put it on the radar of industries
| to watch out for.
|
| I hope the compute efficiencies are rapid and more than
| commensurate with the rate of growth so that we can make
| progress on our climate targets.
|
| However it seems unlikely to me.
|
| It's been a year of progress for the tech... but also a lot
| of setbacks for the rest of the world. I'm fairly certain
| we don't need AGI to tell us how to cope with the climate
| crisis; we already have the answer for that.
|
| Although if the industry does continue to grow and the
| efficiency gains aren't enough... will society/investors be
| willing to scale back growth in order to meet climate
| targets (assuming that AI becomes a large enough segment of
| global emissions to warrant reductions)?
|
| Interesting times for the field.
| jmclnx wrote:
| Interesting, the article is not quite what I expected.
| fosterfriends wrote:
| My fav part of the writeup at the end:
|
| """
|
| LLMs need better criticism # A lot of people absolutely hate this
| stuff. In some of the spaces I hang out (Mastodon, Bluesky,
| Lobste.rs, even Hacker News on occasion) even suggesting that
| "LLMs are useful" can be enough to kick off a huge fight.
|
| I like people who are skeptical of this stuff. The hype has been
| deafening for more than two years now, and there are enormous
| quantities of snake oil and misinformation out there. A lot of
| very bad decisions are being made based on that hype. Being
| critical is a virtue.
|
| If we want people with decision-making authority to make good
| decisions about how to apply these tools we first need to
| acknowledge that there ARE good applications, and then help
| explain how to put those into practice while avoiding the many
| unintiutive traps.
|
| """
|
| LLMs are here to stay, and there is a need for more thoughtful
| critique rather than just "LLMs are all slop, I'll never use it"
| comments.
| vunderba wrote:
| I agree, but I think my biggest issue with LLMs (and a lot of
| GenAI) is that they act as a massive accelerator for the WORST
| (and unfortunately most common) type of human - the lazy one.
|
| The signal-to-noise ratio just goes completely out of control.
|
| https://journal.everypixel.com/ai-image-statistics
| greenavocado wrote:
| Exif watermark by the generators would solve 90% of the
| problem in one fell swoop because lazy people won't remove it
| minimaxir wrote:
| Every image host and social media app automatically strips
| EXIF data (for privacy reasons at minimum).
| Der_Einzige wrote:
| Sorry but the "lazy is bad" crowd is ludditism in another
| form, and it's telling that a whole lot of very smart people
| were passionate defenders of being lazy!
|
| https://en.wikipedia.org/wiki/The_Human_Use_of_Human_Beings
|
| https://en.wikipedia.org/wiki/Inventing_the_Future:_Postcapi.
| ..
|
| https://en.wikipedia.org/wiki/The_Right_to_Be_Lazy
|
| https://en.wikipedia.org/wiki/In_Praise_of_Idleness_and_Othe.
| .. (That's Bertrand Russell)
|
| https://en.wikipedia.org/wiki/The_Abolition_of_Work
|
| https://en.wikipedia.org/wiki/The_Society_of_the_Spectacle
|
| https://en.wikipedia.org/wiki/Bonjour_paresse
|
| AI systems are literally the most amazing technology on earth
| for this exact reason. I am so glad that it is destroying the
| minds of time thieves world-wide!
| mhh__ wrote:
| The people who are lazy but have taste will do well, then.
| Adiqq wrote:
| Isn't it expected that most, if not all, of the content will
| be produced by AI/AGI in the near future? It won't matter
| much, if you're lazy or not. It leads to the question, what
| we'll do instead? People may want to be productive, but we're
| observing in real-time how world is going shit for workers
| and that's basically fact for many reasons.
|
| One reason is that it's cheaper to use AI, even if the result
| is poor. It doesn't have to be high quality, because most of
| the time we don't care about quality, unless something
| interests us. I wonder what kind of shift in power dynamics
| will occur, but so far it looks just like many of us will
| just lose a job. There's no UBI (or social credit proposed by
| Douglas), salaries are low and not everyone lives in good
| location, but corporations try to enforce RTO. Some will
| simply get fired and won't be able to find a new job (that
| won't be sustainable for personal budget, unless someone
| already has low costs of living and is debt-free or has
| somewhat wealthy family that will cover for you).
|
| Well, maybe at least government will protect us? Low chance,
| world is shifting right and it will get worse, once we start
| to experience more and more results of global warming. I
| don't see scenario, where world is becoming better place in
| foreseeable future. We're trapped in society of achievement,
| but soon we may be not able to deliver achievements, because
| if business can get similar results for fraction of the price
| (that is needed to hire human workers), then guess what will
| happen?
|
| These are sad times, full of depression and suffering. I hope
| that some huge transformation in societies will happen soon
| or that AI development slows down, so that some future
| generation will have to deal with consequences (people will
| prioritize saving their own and it won't be pretty, so it's
| better to just pass it down like debt).
| im_down_w_otp wrote:
| This happens with every inane hype-cycle.
|
| I suspect people don't particularly hate or despise LLMs per
| se. They're probably reacting mostly to "tech industry" boom-
| bust bullsh*tter/guru culture. Especially since the cycles seem
| to burn increasingly hotter and brighter the less actual,
| practical value they provide. Which is supremely annoying when
| the second-order effect is having all the oxygen (e.g. capital)
| sucked out of the room for pretty much anything else.
| mhh__ wrote:
| I can think of some runaway scenario's where LLMs are
| definitely bad but, indeed, this particular line of criticism
| is really just luddites longing for a world that probably
| doesn't exist anymore.
|
| These are the people who regulate and legislate for us, they
| are the risk-adverse fools who would rather things be nice and
| harmless lest they be bad but work.
|
| Personally, I think my only serious ideology in this area is
| that I am fundamentally biased towards the power of human
| agency. I'd rather not need to, but in a (perhaps) Nietzschean
| sense I view so-called AI as a force multiplier to totally
| avoid the above people.
|
| AI will enable the creative to be more concrete, and drag those
| on the other end of the scale towards the normie mean. This is
| of great relevance to the developing world too - AI may end up
| a tool for enforcing western culture upon the rest of the world
| but perhaps a force decorrelating it from the McKinsey's of
| tall buildings in big cities.
| throwanem wrote:
| > I've heard from sources I trust that both Google Gemini and
| Amazon Nova charge less than their energy costs for running
| inference...
|
| Then, several headings later:
|
| > I have it on good authority that neither Google Gemini nor
| Amazon Nova (two of the least expensive model providers) are
| running prompts at a loss.
|
| So...which is it?
| simonw wrote:
| Oh whoops! That's an embarrassing mistake, and I didn't realize
| I had that point twice.
|
| They're not running at a loss. I'll fix that.
| cess11 wrote:
| If they are subsidised they can make a profit while still not
| making enough money to cover energy costs.
| kgwgk wrote:
| Subsidised by whom?
| cess11 wrote:
| E.g. tax payers.
| kgwgk wrote:
| Are tax payers subsiding that particular activity of
| Google or Amazon? If they do, "they make enough money" to
| cover costs. If they don't, how does it become profitable
| if it doesn't even cover the cost of one of the inputs?
| simonw wrote:
| The tip I got about both Gemini and Nova is that the low
| prices they are charging still cover their energy costs.
| cess11 wrote:
| OK!
| pkoird wrote:
| I'd love to read a semi-technical book on everything that we've
| learned about what works and what does not on LLMs.
| nkingsy wrote:
| It would be out of date in months.
|
| Things that didn't work 6 months ago do now. Things that don't
| work now, who knows...
| minimaxir wrote:
| There are still some tropes from the GPT-3 days that are
| fundamental to the construction of LLMs that affect how they
| can be used and will not change unless they no longer are
| trained to optimize for next-token-prediction (e.g.
| hallucinations and the need for prompt engineering)
| DoctorOetker wrote:
| Do you mean performance that was missing in the past is now
| routinely achieved?
|
| Or do you actually mean that the same routines and data that
| didn't work before suddenly work?
| Animats wrote:
| > Some of those GPT-4 models run on my laptop
|
| That's an indication that most business-sized models won't need
| some giant data center. This is going to be a cheap technology
| most of the time. OpenAI is thus way overvalued.
| slimsag wrote:
| Unless the best models themselves are costly/hard to produce,
| and there is not a company providing them to people free of
| charge AND for commercial use.
| shihab wrote:
| The last OpenAI valuation I read about was 157 billion. I am
| struggling to understand what justifies this. To me, it feels
| like OpenAI is at best few months ahead of competitors in
| _some_ areas. But even if I am underestimating the advantage,
| it 's few years instead of few months, why does it matter? It's
| not like AI companies are going to enjoy the first-mover
| advantage internet giants had over the competition.
| benreesman wrote:
| Us skeptics believe that valuation prices in some form of
| regulatory capture or other non-market factor.
|
| The non-skeptical interpretation is that it's a threshold
| function, a flat-out race with an unambiguous finish line. If
| someone actually hit self-improving AGI first there's an
| argument that no one would ever catch up.
| com2kid wrote:
| There are some really good books about wars between
| cultures that have AGI and it always comes down to math -
| whoever can get their hands on more compute faster wins.
| api wrote:
| This is also a strong argument for immigration,
| particularly high-skill immigration. In the absence of
| synthetic AGI whoever imports the most human AGI wins.
| Jensson wrote:
| Which suggests that total AGI compute doesn't matter that
| much, as India isn't the world leader the amount of human
| compute they posses would suggest then.
|
| What matters is how you use the AGI, not how much you
| have, with wrong or bad or limiting regulations it will
| not lead anywhere.
| datadrivenangel wrote:
| It's justified if AGI is possible. If AGI is possible, then
| the entire human economy stops making sense as far as money
| goes, and 'owning' part of OpenAI gives you power.
|
| That is of course, assuming AGI is possible and exponential,
| and that marketshare goes to a single entity instead of a set
| of entities. Lots of big assumptions. Seems like we're
| heading towards a slow-lackluster singularity though.
| philipkglass wrote:
| _If AGI is possible, then the entire human economy stops
| making sense as far as money goes, and 'owning' part of
| OpenAI gives you power._
|
| That's if AGI is possible _and not easily replicated_. If
| AGI can be copied and /or re-developed like other software
| then the value of owning OpenAI stock is more like owning
| stock in copper producers or other commodity sector
| companies. (It might even be a poorer investment. Even AGI
| can't create copper atoms, so owners of real physical
| resources could be in a better position in a post-human-
| labor world.)
| whatshisface wrote:
| This belief comes from confusing the singularity (every
| atom on Earth is converted into a giant image of Sam
| Altman) with AGI (a store employee navigates a
| confrontation with an unruly customer, then goes home and
| wins at Super Mario).
| fullstackchris wrote:
| Exactly. I continually fail to see how "the entire human
| economy ends" overnight with another human like agent out
| there - especially if its confined to a server in the
| first place - it can't even "go home" :)
| baobabKoodaa wrote:
| If I recall correctly, these terms were used more or less
| interchangeably for a few decades, until 2020 or so, when
| OpenAI started making actual progress towards AGI, and it
| was clear that the type of AGI that could be imagined at
| that point, would not be of the type that would produce
| singularity.
| AnimalMuppet wrote:
| The GP said, "and exponential". If AGI is exponential,
| then the first one will have a head start advantage that
| compounds over time. That is going to be hard to
| overcome.
| philipkglass wrote:
| I believe that AGI cannot be exponential for long because
| any intelligent agent can only approach nature's limits
| asymptotically. The first company with AGI will be about
| as much ahead as, say, the first company with electrical
| generators [1]. A lot of science fiction about a
| technological singularity assumes that AGI will discover
| and apply new physics to develop currently-believed-
| impossible inventions, but I don't consider that
| plausible myself. I believe that the discovery of new
| physics will be intellectually satisfying but generally
| inapplicable to industry, much like how solving the
| cosmological lithium problem will be career-defining for
| whoever does it but won't have any application to lithium
| batteries.
|
| https://en.wikipedia.org/wiki/Cosmological_lithium_proble
| m
|
| [1] https://en.wikipedia.org/wiki/Siemens#1847_to_1901
| datadrivenangel wrote:
| I don't recall editing my message, but HN can be wonky
| sometimes. :)
|
| Nothing is truly exponential for long, but the logistic
| curve could be big enough to do almost anything if you
| get imaginative. Without new physics, there are still
| some places where we can do some amazing things with the
| equivalent of several trillion dollars of applied R&D,
| which AGI gets you.
| philipkglass wrote:
| I had to edit _my_ message just now because I was
| actually unsure if you edited. Sorry for any
| miscommunication.
| terribleperson wrote:
| This depends on what a hypothetical 'AGI' actually costs.
| If a real AGI is achieved, but it costs more per unit of
| work than a human does... it won't do anyone much good.
| fullstackchris wrote:
| Sure but think of the Higgs... how long that took for
| just _one_ particle. You think an AGI, or even an ASI is
| going to make an experimental effort like that go any bit
| faster? Dream on!
|
| It astounds me that people dont realize how much of this
| cutting edge science stuff literally does NOT happen
| overnight, and not even close to that; typically it takes
| on the order of decades!
| datadrivenangel wrote:
| Science takes decades, but there are many places where we
| could have more amazing things if we spent 10 times as
| much on applied R&D and manufacturing. It wouldn't happen
| overnight, but it will be transformative if people can
| get access to much more automated R&D. We've seen a
| proliferation in makers over the last few decades as
| access to information is easier, and with better tools
| individuals will be able to do even more.
|
| My point being that even if Science ends today, we still
| have a lot more engineering we can benefit from.
| richardw wrote:
| The first AGI will have such an advantage. It'll be the
| first thing that is smart and tireless, can do anything
| from continuously hacking enemy networks to trading
| across all investment classes, to basically taking over
| the news cycle on social media. It would print money and
| power.
| Terr_ wrote:
| One strata in that assumption-heap to call out explicitly:
| Assuming LLMs are an enabling route to AGI and not a dead-
| end or supplemental feature.
| api wrote:
| If AGI is possible then that too becomes a commodity and we
| experience a massive round of deflation in the cost of
| everything not intrinsically rare. Land, food, rare
| materials, energy, and anything requiring human labor is
| expensive and everything else is almost free.
|
| I don't see how OpenAI wouldn't crash and burn here. Given
| the history of models it would be at most a year before
| you'd have open AGI, then the horse is out of the barn and
| the horse begins to self-improve. Pretty soon the horse is
| a unicorn, then it's a Satyr, and so on.
|
| (I am a near-term AGI skeptic BTW, but I could be wrong.)
|
| OpenAI's valuation is a mixture of hype speculation and the
| "golden boy" cult around Sam Altman. In the latter sense
| it's similar to the golden boy cults around Elon Musk and
| (politically) Donald Trump. To some extent these cults work
| because they are self-fulfilling feedback loops: these
| people raise tons of capital (economic or political)
| because everyone knows they're going to raise tons of
| capital so they raise tons of capital.
| parpfish wrote:
| Well, AGI would make the brainy information worker part of
| the economy obsolete. Well still need the jobs that
| interact with the physical world for quite a while. So...
| all us HN types should get ready to work the mines or pick
| vegetables
| throwup238 wrote:
| If we hit true AGI, physical labor won't be far behind
| the knowledge workers. The first thing industrial
| manufacturers will do is turn it towards designing
| robotics, automating the design of factories, and
| researching better electromechanical components like
| synthetic muscle to replace human dexterity.
|
| IMO we're going to hit the point where AI can work on
| designing automation to replace physical labor before we
| hit true AGI, much like we're seeing with coding.
| hdjjhhvvhga wrote:
| > If AGI is possible, then the entire human economy stops
| making sense as far as money goes
|
| I heard people on HN saying this (even without the money
| condition) and I fail to grasp the reasoning behind it.
| Suppose in a few years Altman announces a model, say o11,
| that is supposedly AGI, and in several benchmarks it hits
| over 90%. I don't believe it's possible with LLMs because
| of their inherent limitations but let's assume it can solve
| general tasks in a way similar to an average human.
|
| Now, how come that "the entire human economy stops making
| sense"? In order to eat, we need farmers, we need
| construction workers, shops etc. As for white collar
| workers, you will need a whole range of people to maintain
| and further develop this AGI. So IMHO the opposite is true:
| the human economy will work exactly as before but the job
| market will continue to evolve withe people using AGI in a
| similar way that they use LLMs now but probably with
| greater confidence. (Or not.)
| datadrivenangel wrote:
| Why do we work? Ultimately, we work to live.* If the
| value of our labor is determined by scarcity, then what
| happens when productivity goes nearly infinite and the
| scarcity goes away? We still have needs and wants, but
| the current market will be completely inverted.
| exe34 wrote:
| If you think about all the people trying to automate away
| farming, construction, transport/delivery - these people
| doing the automation themselves get automated out first,
| and the automation figures out how to do the rest. So a
| fully robotic economy is not far off, if you can achieve
| AGI.
| SmooL wrote:
| The thinking goes: - any job that can be done on a
| computer is immediately outsourced to AI, since the AI is
| smarter and cheaper than humans - humanoid robots are
| built that are cheap to produce, using tech advances that
| the AI discovered - any job that can be done by a human
| is immediately outsourced to a robot, since the robot is
| better/faster/stronger/cheaper than humans
| UltraSane wrote:
| If AGI is invented and the inventor tries to keep it secret
| then everyone in the world will be trying to steal it. And
| funding to independently create it would become effectively
| unlimited once it has been proven possible, much like with
| nuclear weapons.
| robertlagrant wrote:
| > If AGI is possible, then the entire human economy stops
| making sense as far as money goes,
|
| What does this mean in terms of making me coffee or
| building houses?
| com2kid wrote:
| If we can simulate a full human intelligence at a
| reasonable speed, we can simulate 100 of them and ask the
| AGI to figure out how to make itself 10x faster.
|
| Rinse and repeat.
|
| That is exponential take off.
|
| At the point where you have an army of AIs running at
| 1000x human speed it can just ask it to design the
| mechanisms for and write the code to make robots that
| automate any possible physical task.
| GOD_Over_Djinn wrote:
| This sounds like magic, not science.
| EMIRELADERO wrote:
| What do you mean by this? Is there any fundamental
| property of intelligence, physicality, or the universe,
| that you think wouldn't let this work?
| fullstackchris wrote:
| Not OP but yes. Electron size vs band gap, computing
| costs (in terms of electricity) any other raw materials
| needed for that energy, etc... sigh... its physics,
| always physics... what fundamental property of physics do
| you think would let a vertical take off in intelligence
| occur?
| datadrivenangel wrote:
| If you look at the rate of mathematical operations
| conducted, we're already going hard vertical. Physics and
| material limitations will slow that eventually as we
| reach a marginal return on converting the planet to
| computer chips, but we're in the singularity as proxy
| measured by mathematical operations.
| edflsafoiewq wrote:
| There are about 8 billion human intelligences walking
| around right now and they've got no idea how to begin
| making even a stupid AGI, let alone a superhuman one.
| Where does the idea that 100 more are going to help come
| from?
| torginus wrote:
| Nothing and the hilarious thing is that the AI
| figureheads admit that technology (as in defined by new
| theorems produced and new code written), will do
| pathetically little to move the needle on human happiness
| forward.
|
| The guy running Anthropic thinks the future is in
| biotech, developing the cure to all diseases, eternal
| youth etc.
|
| Which is technology all right, but it's unclear to me how
| these chatbots (or other AI systems) are the quickest way
| to get there.
| Animats wrote:
| We may not need smarter AI. Just less stupid AI.
|
| The big problem with LLMs is that most of the time they act
| smart, and some of the time they do really, really dumb
| things and don't notice. It's not the ceiling that's the
| problem. It's the floor. Which is why, as the article
| points out, "agents" aren't very useful yet. You can't
| trust them to not screw up big-time.
| torginus wrote:
| I was thinking about how the economy has been actively
| makes less sense and gets divorced more and more from
| reality year after year, AI or not.
|
| It's the simple fact that the ability of assets to generate
| wealth has far outstripped the abiliy of individuals to
| earn money by working.
|
| Somehow real estate has become so expensive _everywhere_
| that owning a shitty apartment is impossible for the vast
| majority.
|
| When the world's population was exploding during the 20th
| century, housing prices were not a problem, yet somehow
| nowadays, it's impossible to build affordable housing to
| bring the prices down, though the population is stagnant or
| growing slowly.
|
| A company can be worth $1B if someone invests $10m in it
| for 1% stake - where did the remaining $990m come from?
| Likewise, the stock market is full of trillion-dollar
| companies whose valuations beggar all explanation,
| considering the sizes of the markets they are serving.
|
| The rich elites are using the wealth to control access to
| basic human needs (namely housing and healthcare) to
| squeeze the working population for every drop of money.
| Every wealth metric shows the 1% and the 1% of the 1%
| control successively larger portions of the economic pie.
| At this point money is ceasing to be a proxy for value and
| is becoming a tool for population control.
|
| And the weird thing is it didn't use to be nearly this bad
| even a decade ago, and we can only guess how bad it will
| get in a decade, AGI or not.
|
| Anyway, I don't want to turn this into a fully-written
| manifesto, but I have trouble expressing these ideas in a
| concise manner.
| nyarlathotep_ wrote:
| > And the weird thing is it didn't use to be nearly this
| bad even a decade ago, and we can only guess how bad it
| will get in a decade, AGI or not.
|
| The last 5 years have reflected a substantial decline in
| QOL in the states; you don't even have to to look back
| that far.
|
| The coronacircus money-printing really accelerated the
| decline.
| ac29 wrote:
| > Somehow real estate has become so expensive everywhere
| that owning a shitty apartment is impossible for the vast
| majority.
|
| Approximately 2/3s of homes in the US are owner occupied.
| orangecat wrote:
| _Somehow real estate has become so expensive everywhere
| that owning a shitty apartment is impossible for the vast
| majority._
|
| That's to be expected when governments forbid people from
| building housing. The only thing I find surprising is
| when people blame this on "capitalism".
| throwpoaster wrote:
| 157 billion implies about a 1% chance at dominating a 1.5
| trillion market. Seems reasonable.
| asqueella wrote:
| 10%, no?
| airstrike wrote:
| that's 10% and who's to say that market is worth 1.5
| trillion to begin with
| cloverich wrote:
| Market cap of apple, google, facebook.
| criddell wrote:
| > what justifies this
|
| People are buying shares at $x because they believe they will
| be able to sell them for more later. I don't think there's a
| whole to more to it than that.
| epicureanideal wrote:
| And of course, as processors improve this becomes more and more
| the case.
| refulgentis wrote:
| Been in the Mac ecosystem since 2008, love it, but there is,
| and always has been, a tendency to talk about inevitabilities
| from scaling bespoke, extremely expensive configurations, and
| with LLMs, there's heavy eliding of what the user experience
| is, beyond noting response generation speed in tokens/s.
|
| They run on a laptop, yes - you might squeeze up to 10
| token/sec out of a kinda sorta GPT-4 if you paid $5K plus for
| an Apple laptop in the last 18 months.
|
| And that's after you spent _2 minutes_ watching 1000 token*
| prompt prefill at 10 tokens /sec.
|
| Usually it'd be obvious this'd trickle down, things always do,
| right?
|
| But...Apple infamously has been stuck on 8GB of RAM in even
| $1500 base models for years. I have 0 idea why, but my
| intuition is RAM was ~doubling capacity at same cost every 3
| years till early 2010s, then it mostly stalled out post 2015.
|
| And regardless of any of the above, this absolutely _melts_
| your battery. Like, your 16 hr battery life becomes 40 minutes,
| no exaggeration.
|
| I don't know why prefill (loading in your prompt) is so slow
| for local LLMs, but it is. I assume if you have a bunch of
| servers there's some caching you can do that works across all
| prompts.
|
| I expect the local LLM community to be roughly the same size it
| is today 5 years from now.
|
| * ~3 pages / ~750 words; what I expect is a conservative
| average for prompt size when coding
| lowercased wrote:
| I have a 2023 mbp, and I get about 100-150 tok/sec locally
| with lmstudio.
| datadrivenangel wrote:
| Which models?
| refulgentis wrote:
| For context, I got M2 Max MBP, 64 GB shared RAM, bought
| it March 2023 for $5-6K. Llama 3.2 1.0B -
| 650 t/s Phi 3.5 3.8B - 60 t/s. Llama 3.1
| 8.0B - 37 t/s. Mixtral 14.0B - 24 t/s.
|
| Full GPU acceleration, using llama.cpp, just like LM
| Studio.
| lowercased wrote:
| hugging-quants/llama-3.2-1b-instruct-q8_0-gguf - 100-150
| tok/sec
|
| second-state/llama-2-7b-chat-gguf net me around ~35
| tok/sec
|
| lmstudio-community/granite-3.1.-8b-instruct-GGUF - ~50
| tok/sec
|
| MBP M3 Max, 64g. - $3k
| refulgentis wrote:
| I'm not sure if you're pointing out any / all of these:
|
| #1. It is possible to get an arbitrarily fast
| tokens/second number, given you can pick model size.
|
| #2. Llama 1B is roughly GPT-4.
|
| #3. Given Llama 1B runs at 100 tokens/sec, and given
| performance at a given model size has continued to
| improve over the past 2 years, we can assume there will
| eventually be a GPT-4 quality model at 1B.
|
| On my end:
|
| #1. Agreed.
|
| #2. Vehemently disagree.
|
| #3. TL;DR: I don't expect that, at least, the trend line
| isn't steep enough for me to expect that in the next
| decade.
| mjburgess wrote:
| I don't think openai's valuation comes from a data center bet
| -- rather, I'd suppose, investors think it has a first-mover
| advantage on model quality that it can (maybe?) attract some
| buy-out interest or otherwise use in yet-to-be-specified
| product lines.
|
| However, it has been clear for a long time that meta are just
| demolishing any competitor's moats, driving the whole megacorp
| AI competition to razor thin margins.
|
| It's a very welcome strategy from a consumer pov, but -- it has
| to be said -- genius from a business pov. By deciding that no
| one will win, it can prevent anyone leapfrogging them at a
| relatively cheap price.
| hyperpape wrote:
| This seems like a non-sequitur unless you're assuming something
| about the amount that people use models.
|
| Most web servers can run some number of QPS on a developer
| laptop, but AWS is a big business, because there are a heck of
| a lot of QPS across all the servers.
| thinkingemote wrote:
| Most of the laptops that the models can run on today are in the
| high end of dedicated bare metal servers. Most shared VM
| servers are way below these laptops. Most people buying a new
| laptop today won't be able to run them, most devs getting a
| website up with a server won't be able to run them.
|
| This means that the definitions of "laptop" and "server" are
| dependent on use. We should instead talk about RAM, GPU and CPU
| speed which is more useful and informative but less engaging
| than "my laptop".
| m3kw9 wrote:
| The best models are always out of reach on desktops. You can
| have ok models but AGI will come in a datacenter first
| neom wrote:
| "learned out about" - is that an Australian phraseology by
| chance? Sounds Australian or British of some manner.
| user982 wrote:
| You can find out, you can learn about, but you can't learn out
| about.
| simonw wrote:
| That was a very dumb typo in my title!
| neom wrote:
| I figured as much, although I wondered if you were going for
| the kinda "he learn out about not pissing people off real
| sharpish" kinda tone I've heard in Scotland before, but
| wasn't sure. Big fan btw, happy new years Simon! :)
| mjburgess wrote:
| Good ear -- the use of 'out' as an abbreviation of anything is
| a britishism.
|
| Nowt, owt, -- nothing, anything
| JaDogg wrote:
| I think LLM web applications need a big red warning (non
| interactive, I don't want more cookie dialogs) like in
| cigarettes.
|
| > LLM generated content need to be verified.
| becquerel wrote:
| Every LLM web app I have used has a disclaimer along these
| lines prominently featured in the UI. Maybe the disclaimer
| isn't bright red with gifs of flashing alarms, but the warnings
| are there for the people who would pay attention to them in the
| first place.
| minimaxir wrote:
| Unfortunately, even after 2 years of ChatGPT and countless
| news stories about it, people still don't realize that LLMs
| can be wrong.
|
| There maybe should be a bright red flashing disclaimer at
| this point.
| Der_Einzige wrote:
| RE: Slop:
|
| Having Slop generations from an LLM is a choice. There are so
| many tricks to make models genuinely creative just at the sampler
| level alone.
|
| https://github.com/sam-paech/antislop-sampler
|
| https://openreview.net/forum?id=FBkpCyujtS
| simonw wrote:
| It doesn't matter how good the generated text is: it is still
| slop if the recipient didn't request it and no human has
| reviewed it.
| Der_Einzige wrote:
| By that definition machine to machine communication that
| happens "organically" (like how humans do it, where they
| sometimes strike up conversations unprompted with each other)
| is "slop".
|
| You're not seeing how the future of the world will develop.
| simonw wrote:
| If you ask me to read an unguided conversation between two
| LLMs then yes, I'd consider that slop.
|
| Some people might _like_ slop.
| minimaxir wrote:
| The rise of the famous obvious Facebook AI slop indicates
| that some demographics love it.
| orbital-decay wrote:
| This won't solve anything. There's a myriad of sampling
| strategies, and they all have the same issue: samplers are
| dumb. They have no access to the semantics of what they're
| sampling. As a result, things like min-p or XTC will either
| overshoot or undershoot as they can't differentiate between the
| situations. For the same reason, samplers like DRY can't solve
| repetition issues.
|
| Slop is over-representation of model's stereotypes and lack of
| prediction variety in cases that need it. Modern models are
| insufficiently random when it's required. It's not just
| specific words or idioms, it's _concepts_ on very different
| abstraction levels, from words to sentence patterns to entire
| literary devices. You can 't fix issues that appear on the
| latent level by working with tokens. The antislop link you give
| seems particularly misguided, trying to solve an NLP task
| programmatically.
|
| Research like [1] suggests algorithms like PPO as one of the
| possible culprits in the lack of variety, as they can filter
| out entire token trajectories. Another possible reason is
| training on outputs from the previous models and insufficient
| filtering of web scraping results.
|
| And of course, prediction variety != creativity, although it's
| certainly a factor. Creativity is an ill-defined term like many
| in these discussions.
|
| [1] https://arxiv.org/abs/2406.05587
| draw_down wrote:
| I agree the criticism is poor; it's often very lazy. There are
| currently a lot of dog-brain "wrap a LLM around it" products,
| which are worthy of scorn. Much of the lazy criticism is pointing
| at such products and therefore writing off the whole endeavor.
|
| But that doesn't necessarily reflect the potential of the
| underlying technology, which is developing rapidly. Websites were
| goofy and pointless until Amazon came around (or Yahoo or
| whatever you prefer).
|
| I guess potential isn't very exciting or interesting on its own.
| dtquad wrote:
| What is the current status on pushing "reasoning" down to
| latent/neural space? Seems like a vaste of tokens to let a model
| converse with itself especially when this internal monologue
| often has very little to do with the final output so it's not
| useful as a log of how the final output was derived.
| dmd wrote:
| See https://news.ycombinator.com/item?id=42555320
| adsharma wrote:
| In spite of all this progress, I can't find LLMs that solve
| simple tasks like:
|
| Here is my resume. Make it look nice (some design hints).
|
| They can spit html and css, but not Google doc.
|
| On the other hand, Google results are dominated by SEO spam. You
| can probably find one usable result on page 10.
|
| The problem is not technology. It's a business model that can
| support the humans feeding data into the LLM.
| Alex-Programs wrote:
| Why would they be able to output a Google doc? It's a
| proprietary format. The closest thing would be rich text format
| to copy paste.
| vikramkr wrote:
| That proprietary format is owned by a company associated with
| folks who won two nobel prizes for AI related work this year
| and the employer at the time of the researchers who wrote the
| attention is all you need paper and also the owner of a
| search engine with access to like, all the data. Doesn't seem
| unreasonable lol
| logicchains wrote:
| They can spit out LaTeX, and a PDF from that is going to look
| much nicer than a Google doc (and display the same everywhere).
| As an added bonus, the recruiter can't randomly rewrite parts
| of it (at least not so easily).
| nox101 wrote:
| The recuiter isn't going to print out your resume. They're
| going to read in their computer or iPad or phone.
| trenchgun wrote:
| For sure they will read a pdf and not a google doc.
| Gooblebrai wrote:
| > They can spit html and css, but not Google doc.
|
| Wow. At this stage, I think people are just searching for
| excuses to complain about anything that the LLM does NOT do.
| henning wrote:
| Spookily good at writing code? LLMs frequently hallucinate broken
| nonsense shit when I use them.
|
| Recognize what they do well (generate simple code in popular
| languages) while acknowledging where they are weak (non-trivial
| algorithms, any novel code situation the LLM hasn't seen before,
| less popular languages).
| simonw wrote:
| Did you try learning HOW to get good code out of them?
|
| As with all things LLM there's a whole lot of undocumented and
| under appreciated depth to getting decent results.
|
| Code hallucinations are also the least damaging type of
| hallucinations, because you get fact checking for free: if you
| run the code and get an error you know there's a problem.
|
| A lot of the time I find pasting that error message back into
| the LLM gets me a revision that fixes the problem.
| lolinder wrote:
| > Code hallucinations are also the least damaging type of
| hallucinations, because you get fact checking for free: if
| you run the code and get an error you know there's a problem.
|
| This is great when the error is a thrown exception, but less
| great when the error is a subtle logic bug that only strikes
| in some subset of cases. For trivial code that only you will
| ever run this is probably not a big deal--you'll just fix it
| later when you see it--but for code that must run unattended
| in business-critical cases it's a totally different story.
|
| I've personally seen a dramatic increase in sloppy logic that
| _looks_ right coming from previously-reliable programmers as
| they 've adopted LLMs. This isn't an imaginary threat, it's
| something I now have to actively think about in code reviews.
| simonw wrote:
| Yeah, the other skill you need to develop to make the most
| of AI-assisted programming is _really good_ manual QA.
| lolinder wrote:
| Have you found that to be a good trade-off for large-
| scale projects?
|
| Where I'm at right now with LLMs is that I find them to
| be very helpful for greenfield personal projects.
| Eliminating the blank canvas problem is huge for my
| productivity on side projects, and they excel at getting
| projects scaffolded and off the ground.
|
| But as one of the lead engineers working on a million+
| line, 10+ year-old codebase, I've yet to see any
| substantial benefit come from myself or anyone else using
| LLMs to generate code. For every story where someone
| found time saved, we have a near miss where flawed code
| almost made it in or (more commonly) someone eventually
| deciding it was a waste of time to try because the model
| just wasn't getting it.
|
| Getting better at manual QA would help, but given the
| number of times where we just give up in the end I'm not
| sure that would be worth the trade-off over just
| discouraging the use of LLMs altogether.
|
| Have you found these things to actually work on large,
| old codebases given the right context? Or has your
| success likewise been mostly on small things?
| simonw wrote:
| I use them successfully on larger project all the time.
|
| "Here's some example JavaScript code that sends an email
| through the SendGrid REST API. Write me a python function
| for sending an email that accepts an email address,
| subject, path to a Jinja template and a dictionary of
| template context. It should return true or false for if
| the email was sent without errors, and log any error
| messages to stderr"
|
| That prompt is equally effective for a project that's 500
| lines or 5,000,000 lines of code.
|
| I also use them for code spelunking - you can pipe quite
| a lot of code into Gemini and ask questions like "which
| modules handle incoming API request validation?" - that's
| why I built https://github.com/simonw/files-to-prompt
| gre wrote:
| I had some success converting a react app with classes to
| use hooks instead. Also asking it to handle edge cases,
| like spaces in a filename in a bash script--this fixes
| some easy problems that might have come up. The corollary
| here is that pointing out specific problems or mentioning
| the right jargon will produce better code than just
| asking for the basic task.
|
| It's very bad at Factor but pretty good at naming things,
| sometimes requiring some extra prompting. [generate 25
| possible names for this variable...]
| switch007 wrote:
| QA are going to be told to use AI too
|
| (Seems every job is fair game according to CTOs. Well,
| except theirs)
| polishdude20 wrote:
| When they spit out these subtle bugs, are you promoting the
| LLM to watch our for that particular bug? I wonder if it
| just needs a vir more guidance in more explicit terms
| lolinder wrote:
| At a certain point it becomes more work to prompt the LLM
| with each and every edge case than it is to just write
| the dang code.
|
| I work out what the edge cases are by writing and
| rewriting the code. It's in the process of shaping it
| that I see where things might go wrong. If an LLM can't
| do that on its own it isn't of much value for anything
| complicated.
| joelanman wrote:
| > if you run the code and get an error you know there's a
| problem.
|
| well, sometimes - other times it'll be wrong with no error,
| or insecure, or inaccessible, and so on
| xyzsparetimexyz wrote:
| Is there more to getting 'good' at them then just copying
| error messages back in? Like, how do I get them to reason
| about e.g. whether a data structure compression method makes
| sense?
| AnimalMuppet wrote:
| > Did you try learning HOW to get good code out of them?
|
| That is at least somewhat a valid point. Good workers know
| how to get the best out of their tools. And yet, _good_ tools
| accommodate how their users work, instead of expecting the
| user to accommodate how the tool works.
|
| One could also say that programmers were sold a misleading
| bill of goods about how LLMs would work. From what they were
| told, they shouldn't _have_ to learn how to get the best out
| of LLMs - LLMs were AI, on the way to AGI, and would just
| give you everything you needed from a simple prompt.
| simonw wrote:
| Yeah, that's one of the biggest misconceptions I've been
| trying to push back against.
|
| LLMs are power-user tools. They're nowhere near as easy to
| use as they look (or as their marketing would have you
| believe).
|
| Learning to get great results out of them takes a
| significant amount of work.
| henning wrote:
| Like all AI simps, your blanket response to pointing out
| flaws is to tell me to do more prompt engineering and then
| dismiss the issue entirely. In the time it takes me to coax
| the model to do the thing I was told it knows how to do, I
| could just do the task myself. Your examples of LLM code
| generation are simple, easy to specify, self-contained
| applications that are not representative of software you can
| actually build a business on. Please do something your
| beloved LLMs can't and come up with an original idea.
| minimaxir wrote:
| > not representative of software you can actually build a
| business on
|
| The only people pushing that you can BUILD AN APP WITHOUT
| WRITING A LINE OF CODE are the Twitter AI hypesters. Simon
| doesn't assert anything of the sort.
|
| LLMs are more-than-sufficient for code snippets and small
| self-contained apps, but they are indeed far from replacing
| software engineers.
| phantompeace wrote:
| Like all stubborn anti-AI know-it-alls, you sound like
| you've tried a couple of times to do something and have
| decided to label all LLMs with the same brush.
|
| What models have you tried, and what are you trying to do
| with them? Give us an example prompt too so we can see how
| you're coaxing it so we can rule out skill issue.
|
| And a big strength LLMs have is summarizing things - I'd
| like to see you summarize the latest 10 arxiv papers
| relating to prompt engineering and produce a report geared
| towards non-techies. And do this every 30 mins please. Also
| produce social media threads with that info. Is this a task
| you could do yourself, better than LLMs?
| henning wrote:
| Due to unexpected capacity constraints, Claude is unable
| to reply to this message.
| voidhorse wrote:
| > And a big strength LLMs have is summarizing things -
| I'd like to see you summarize the latest 10 arxiv papers
| relating to prompt engineering and produce a report
| geared towards non-techies. And do this every 30 mins
| please. Also produce social media threads with that info.
| Is this a task you could do yourself, better than LLMs?
|
| Right, but this is the part that is silly and sort of
| disingenuous and I think built upon a weird understanding
| of value and productivity.
|
| Doing _more_ constantly isn 't inherently valuable. If
| one human writes a magnificently crafted summary of those
| papers _once_ and it is promulgated across channels
| effectively, this is both better and more economical than
| having an LLM compute one (slightly incorrect) summary
| for each individual on demand. In fact, all the LLM does
| in this case is increase the amount of possible lower
| quality noise in the space. The one edge an LLM might
| have at this stage is to generate a summary that accounts
| for more recent information, thereby getting around the
| inevitable gradual "out of dateness" of human authored
| summaries at time T, but even then, this is not great if
| the trade off is to pollute the space with a. bunch of
| ever so slightly different variants of the same text.
| It's such a weird, warped idea of what productivity is,
| it's basically the lazy middle-manager's idea of what it
| means to be productive. We need to remember that not all
| processes are reducible to their outputs--sometimes the
| process is the point, not the immediate output (e.g.
| education).
| th0ma5 wrote:
| Simon gets one thing working for one task and assumes everyone
| can do the same for everything. That's the trick is that he has
| no idea how the failures happen or how to maintain actual
| working systems.
| k2xl wrote:
| Something not mentioned is AI generated music. Suno's development
| this year is impressive. Unclear what this will mean for music
| artists over next few years.
| simonw wrote:
| Yeah, this year I decided to just focus on LLMs - I didn't
| touch on any of the image or music generation advances either.
| I haven't been following those closely enough to have
| particularly useful things to say about them.
| fullstackchris wrote:
| Very clear; I like buying music produced by people who play
| instruments.
| antirez wrote:
| About "people still thinking LLMs are quite useless", I still
| believe that the problem is that most people are exposed to
| ChatGPT 4o that at this point for my use case (programming /
| design partner) is basically a useless toy. And I guess that in
| tech many folks try LLMs for the same use cases. Try Claude
| Sonnet 3.5 (not Haiku!) and tell me if, while still flawed, is
| not helpful.
|
| But there is more: a key thing with LLMs is that their ability to
| help, as a tool, changes vastly based on your communication
| ability. The prompt is the king to make those models 10x better
| than they are with the lazy one-liner question. Drop your files
| in the context window; ask very precise questions explaining the
| background. They work great to explore what is at the borders of
| your knowledge. They are also great at doing boring tasks for
| which you can provide perfect guidance (but that still would take
| you hours). The best LLMs (in my case _just_ Claude Sonnet 3.5, I
| must admit) out there are able to accelerate you.
| wslh wrote:
| Right, in simpler terms: The measure of LLMs success is how
| effectively they help you achieve your goal faster.
| antirez wrote:
| Exactly, and right now the LLMs acceleration effect is _a
| tool_ , not "give me the final solution". Even people that
| can't code, using LLMs to build applications from scratch,
| still have this tool mindset. This is why they can use them
| effectively: they don't stop at the first failed solution;
| they provide hints to the LLM, test the code, try to figure
| what's the problem (also with the LLM help), and so forth.
| It's a matter of mindset.
| hdjjhhvvhga wrote:
| While Claude Sonnet is superior than 4o for most my use cases,
| there are still occasionally some specific tasks where it
| performs slightly better.
| antirez wrote:
| Probably. But statistically to work with 4o is a lose of time
| for me. LLMs is like an investment: you write the prompts,
| you "work" with them. If the LLM is too weak, this is a lose
| of time. You need to have a return on the investment that is
| positive. With ChatGPT 4o / o1 most of the times for me the
| investment of time has almost zero return. Before Claude
| Sonnet 3.5 I already had a ChatGPT PRO account but never used
| it for coding since it was most of the times useless if not
| for throw away scripts that I didn't want to do myself or as
| a stack overflow replacement for trivial stuff. Now it's
| different.
| airstrike wrote:
| This mirrors my experience 100%. I'm not even sure why I
| still pay for OpenAI at this point. Claude 3.5 is just
| incredibly superior. And I totally agree on the point about
| dropping in context and asking very specific questions.
| I've had Claude pinpoint a bug in a 2k LOC module that I
| was struggling to find the cause for. After wasting a lot
| of time on it on my own, I thought "what the heck, maybe
| Claude can figure it out" and it did. It's objectively
| useful, even if flawed sometimes.
| d0mine wrote:
| why "lose of time" instead of "loss of time" Is it a typo
| or fingerprinting?
| tootie wrote:
| Like what? Claude has become my go-to, but I find that it's
| wrong enough often enough that I really can't trust it for
| anything. If it says something I have to go dig through it's
| citations very carefully.
| brookst wrote:
| I'm surprised you only have one use case. I use LLMs to
| research travel, adjust recipes, check biographies and book
| reviews, and many many more things.
| minimaxir wrote:
| > Claude Sonnet 3.5 (not Haiku!)
|
| A very big surprise is just _how_ much better Sonnet 3.5 is
| than Haiku. Even the confusingly-more-expensive-Haiku-variant
| Haiku 3.5 that 's more recent than Sonnet 3.5 is still much
| worse.
| mhh__ wrote:
| Hopefully things have narrowed but you can see from the trends
| data just how few people (API may be a different story) use
| claude relative to chatgpt.
| minimaxir wrote:
| Brand awareness is a hell of a drug.
| mhh__ wrote:
| Indeed, although I find myself reaching for o1 more than
| Claude for matters other than programming, solely because
| it has better LaTeX (...)
| dxbydt wrote:
| > best LLMs are able to accelerate you
|
| https://www2.math.upenn.edu/~ghrist/preprints/LAEF.pdf - this
| math textbook was written in just 55 days!
|
| Paraphrasing the acknowledgements -
|
| ...Begun November 4, 2024, published December 28, 2024.
|
| ...assisted by Claude 3.5 sonnet, trained on my previous
| books...
|
| ...puzzles co-created by the author and Claude
|
| ...GPT-4o and -o1 were useful in latex configurations...doing
| proof-reading.
|
| ...Gemini Experimental 1206 was an especially good proof-reader
|
| ...Exercises were generated with the help of Claude and may
| have errors.
|
| ...project was impossible without the creative labors of Claude
|
| The obvious comparison is to the classic Strang
| https://math.mit.edu/~gs/everyone/ which took several *
| _years*_ to conceptualize, write, peer review, revise and
| publish.
|
| Ok maybe Strang isn't your cup of tea, :%s/Strang/Halmos/g ,
| :%s/Strang/Lipschutz/g, :%s/Strang/Hefferon/g,
| :%s/Strang/Larson/g ...
|
| Working through the exercises in this new LLMbook, I'm
| thinking...maybe this isn't going to stand the test of time.
| Maybe acceleration is not so hot after all.
| datadrivenangel wrote:
| Going faster isn't good if the quality drops enough that
| overall productivity decreases... Infinite slop is only a
| good thing for pigs.
| cruffle_duffle wrote:
| Just use ChatGPT to summarize its own output. It's like
| running your JPEG back through the JPEG compressor again!
| kianN wrote:
| ^ This perfectly encapsulates the story I see every time
| someone digs into the details of any llm generated or
| assisted content that has any level of complexity.
|
| Great on the surface but lacks any depth, cohesive, or
| substance
| pton_xd wrote:
| "The story of linear algebra begins with systems of
| equations, each line describing a constraint or boundary
| traced upon abstract space. These simplest mathematical
| models of limitation -- each equation binding variables in
| measured proportion -- conjoin to shape the realm of possible
| solutions. When several such constraints act in concert,
| their collaboration yields three possible fates: no solution
| survives their collective force; exactly one point satisfies
| all bounds; or infinite possibilities trace curves and planes
| through the space of satisfaction. This trichotomy -- of
| emptiness, uniqueness, and infinity -- echoes through all of
| linear algebra, appearing in increasingly sophisticated forms
| as our understanding deepens."
|
| Maybe I'm not the target audience, but... that really doesn't
| make me interested in continuing to read.
| jpc0 wrote:
| Even putting it here is annoying to me... Those are a lot
| of words saying nothing that I just spend time reading.
|
| I'm agreeing with you.
| mooreds wrote:
| I started a book about CIAM (customer identity and access
| management) using Claude to help outline a chapter. I'd edit
| and refine the outline to make sure it covered everything.
|
| Then I'd have Claude create text. I'd then edit/refine each
| chapter's text.
|
| Wow, was it unpleasant. It was kinda cool to see all the
| words put together, but editing the output was a slog.
|
| It's bad enough editing your own writing, but for some reason
| this was even worse.
| ninth_ant wrote:
| I think a lot of the confusion is in how we approach LLMs.
| Perhaps stemming from the over-broad term "AI".
|
| There are certain classes of problems that LLMs are good at.
| Accurately regurgitating all accumulated world knowledge ever
| is not one, so don't ask a language model to diagnose your
| medical condition or choose a political candidate.
|
| But _do_ ask them to perform suitable tasks for a language
| model! Every day by automation I feed in the hourly weather
| forecast my home ollama server and it builds me a nice readable
| concise weather report. It's super cool!
|
| There are lots of cases like this where you can give an LLM
| reliable data and ask it to do a language related task and it
| will do an excellent job of it.
|
| If nothing else it's an extremely useful computer-human
| interface.
| rrix2 wrote:
| > Every day by automation I feed in the hourly weather
| forecast my home ollama server and it builds me a nice
| readable concise weather report.
|
| not to dissuade you from a thing you find useful but are you
| aware that the national weather service produces an Area
| Forecast Discussion product in each local NWS office daily or
| more often that accomplishes this with human meteorologists
| and clickable jargon glossary?
|
| https://forecast.weather.gov/product.php?site=SEW&issuedby=S.
| ..
| ninth_ant wrote:
| Doesn't dissuade me at all, that's a really neat service.
| I'm not American though, and even if my own country had a
| similar service I still enjoying tuning the results to
| focus on what I'm interested in. And it was just an example
| of the kinds of computer-human interfaces that are newly
| possible from this technology.
|
| Anytime you have data and want it explained in a casual way
| -- and it's not mission critical to be extremely precise --
| LLMs are going to be a good option to consider.
|
| More useful AGI-like behaviours may be enabled by combining
| LLMs with other technologies down the line, but we
| shouldn't try to pretend that LLMs can do everything nor
| are they useless.
| pixl97 wrote:
| >don't ask a language model to diagnose your medical
| condition
|
| Honestly they are very decent at it if you give them accurate
| information in which to make the diagnosis. The typical
| problem people have is being unable to feed accurate
| information to the model. They'll cut out parts they don't
| want to think about or not put full test results in for
| consideration.
| 1oooqooq wrote:
| yeah, they save as much time as finding a template with a good
| old search and using it.
| uludag wrote:
| I don't think people finding LLMs useless is a good
| representation of the general sentiment though. I feel that
| more than anything, people are annoyed at LLM slop. Someone
| uses an LLM too much to write code, they create "slop," which
| ends up making things worse.
| gre wrote:
| Yes but then they can prompt it to golf the code and most of
| the slop goes away. This sometimes breaks the code.
| antirez wrote:
| Unfortunately complex tools will be misused by part of the
| population. There is no easy escape from that in the
| modernity of possibilities. Look at the Internet itself.
| cruffle_duffle wrote:
| To get the most out of them you have to provide context. Treat
| these models like some kind of eager beaver junior engineer who
| wants to jump in and write code without asking questions. Force
| it to ask questions (eg: "do not write code yet, please restate
| my requirements to make sure we are in alignment. Are there any
| extra bits of context or information that would help? I will
| tell you when to write code")
|
| If your model / chat app has the ability to always inject some
| kind of pre-prompt make sure to add something like "please do
| not jump to writing code. If this was a coding interview and
| you jumped to writing code without asking questions and
| clarifying requirements you'd fail".
|
| At the top of all your source files include a comment with the
| file name and path. If you have a project on one of these
| services add an artifact that is the directory tree ("tree
| ---gitignore" is my goto). This helps "unaided" chats get a
| sense of what documents they are looking at.
|
| And also, it's a professional bullshitter so don't trust it
| with large scale code changes that rely on some language /
| library feature you don't have personal experience with. It can
| send you down a path where the entire assumption that something
| was possible turns out to be false.
|
| Does it seek like a lot of work? Yes. Am I actually more
| productive with the tool than without? Probably. But it sure as
| shit isn't "free" in terms of time spent providing context. I
| think the more I use these models, the more I get a sense of
| what it is good at and what is going to be a waste of time.
|
| Long story short, prompting is everything. These things aren't
| mind readers (and worse they forget everything in each new
| session)
| jsheard wrote:
| I swear these goalposts keep getting moved, I remember being
| told that GPT3.5 is a useless toy but the paid GPT4 is
| lifechanging, and now that GPT4 is free I'm told that it's a
| useless toy but paid o1 or paid Sonnet are lifechanging.
| Looking forward to o1 and Sonnet becoming useless toys, unlike
| the lifechanging o3.
| aetherson wrote:
| You will also be dismayed to hear that a 2011 iPhone is no
| longer state-of-the-art, and indeed can't run most modern
| apps.
| jpc0 wrote:
| GPT4 is a 13 year old technology? Compared to o1 and Sonnet
| 3.5?
|
| If someone told me an iPhone 4 is terrible but an iPhone 5
| would definitely serve my needs, then when I get an iPhone
| 5 they say the same of the 6 you really want me to believe
| them a second time? Then a third time? Then a 4th? In the
| mean time my time and money is wasted?
| scubbo wrote:
| Holy false-equivalency, Batman! The definitions of "useless
| toy / lifechanging tool" are _not_ changing over time (or,
| at least, not over the timescale being explored here),
| whereas the expectations and requirements of processing
| power of a phone are.
| qsort wrote:
| I believe it's more frustration directed at the mismatch
| between marketing and reality, combined with the general _well
| deserved_ growing hatred for SV culture, and, more broadly,
| software engineers. The sentiment would be completely different
| if the entire industry marketed themselves like the helpful
| tools they are rather than the second coming of Christ they
| aren 't. This distinction is hard to make on "fast food" forums
| like this one.
|
| If you aren't a coder, it's hard to find much utility in
| "Google, but it burns a tree whenever you make an API call, and
| everything it tells you might be wrong". I for one have never
| used it for anything else. It just hasn't ever come up.
|
| It's great at cheating on homework, kids love GPTs. It's great
| at cheating in general, in interviews for instance. Or at
| ruining Christmas, after this year's LLM debacle it's unclear
| if we'll have another edition of Advent of Code. None of this
| is the technology's fault, of course, you could say the same
| about the Internet, phones or what have you, but it's hardly a
| point in favor either.
|
| And if you are a coder, models like Claude actually do help
| you, but you have to monitor their output and thoroughly test
| whatever comes out of them, a far cry from the promises of
| complete automation and insane productivity gains.
|
| If you are only a consumer of this technology, like the vast
| majority of us here, there isn't that much of an upside in
| being an early adopter. I'll sit and wait, slowly integrating
| new technology in my workflow if and when it makes sense to do
| so.
|
| Happy new year, I guess.
| duped wrote:
| > Try Claude Sonnet 3.5 (not Haiku!) and tell me if, while
| still flawed, is not helpful.
|
| It's not _as_ helpful as Google was ten years ago. It 's more
| helpful than Google today, because Google search has slowly
| been corrupted by garbage SEO and other LLM spam, including
| their own suggestions.
| ChicagoDave wrote:
| Claude Sonnet 3.5 can write whole React applications with
| proper contextual clues and some minor iterations. Google has
| never coded for you.
|
| I've written two large applications and about a dozen smaller
| ones using Claude as an assistant.
|
| I'm a terrible front-end developer and almost none of that
| work was possible without Claude. The API and AWS deployment
| were sped up tremendously.
|
| I've created unit tests and I've read through the resulting
| code and it's very clean. One of my core pre-prompt
| requirements has always been to follow domain-driven design
| principles, something a novice would never understand.
|
| I also start with design principles and a checklist that
| Claude is excellent at providing.
|
| My only complaint is you only have a 3-4 hour window before
| you're cutoff for a few hours.
|
| And needing an enterprise agreement to have a walled garden
| for proprietary purposes.
|
| I was not a fan in Q1. Q2 improved. Q3 was a massive leap
| forward.
| duped wrote:
| I've never really used Claude for writing code, becuase I'm
| not really bottlenecked by that problem. I have used it
| quite a bit for asking questions about what code to write
| and it's almost always wrong (usually in subtle ways that
| would trick someone with little experience).
|
| Maybe it was overtrained on react sources, but for me it's
| pretty useless.
|
| The big annoyance for me is it just makes up APIs that
| don't exist. While that's useful for suggesting to me what
| APIs I should add to my own code, it's really pointless if
| I ask a question like "using libfoo how do I bar" and it
| tells me "call the doBar() function" which does not exist.
| bdangubic wrote:
| comparing google to claude 3.5 is like comparing tesla s
| plaid with a horse
| emptiestplace wrote:
| What a hilariously absurd statement. You might want to
| actually try it.
| swalsh wrote:
| I'm a big believer in Claude. I've accomplished some huge
| productivity gains by leveraging it. That said, I can see
| places where the models are strong and weak. If you're doing
| react, or python. These models are incredible. C#, C++ they're
| not terrible. Rust though, it's not great. If your experience
| is exclusively trying to use it to write Rust, it doesn't
| matter if you're using o1, Claude or anything else. It's just
| not great at it yet.
| mvkel wrote:
| I'm surprised at the description that it's "useless" as a
| programming / design partner. Even if it doesn't make "elegant"
| code (whatever that means), it's the difference between an app
| existing at all, or not.
|
| I built and shipped a Swift app to the App Store, currently
| generating $10,200 in MRR, exclusively using LLMs.
|
| I wouldn't describe myself as a programmer, and didn't plan to
| ever build an app, mostly because in the attempts I made, I'd
| get stuck and couldn't google my way out.
|
| LLMs are the great un-stickers. For that reason per se, they
| are incredibly useful.
| raydev wrote:
| Which service/LLM performed the best for you?
| egometry wrote:
| To the un-sticking point: it's also great at letting people
| ask questions without being perceived as dumb
|
| Tragically - admitting ignorance, even with the desire to
| learn, often has negative social reprocussions
| simonw wrote:
| Asking "stupid" questions without fear of judgement is
| legit one of my favorite personal applications of LLMs.
| HarHarVeryFunny wrote:
| Did you need a Mac for that, or is it possible to use Linux
| to develop a Swift app targeting iOS?
|
| Would you mind sharing which app you released?
| FooBarWidget wrote:
| Why do people have such narrow views on what makes LLMs useful?
| I use them for basically everything.
|
| My son throwing an irrational tantrum at the amusement park and
| I can't figure out why he's like that (he won't tell me or he
| doesn't know himself either) or what I should do? I feed Claude
| all the facts of what happened that day and ask for advice.
| Even if I don't agree with the advice, at the very least the
| analysis helps me understand/hypothesize what's going on with
| him. Sure beats having to wait until Monday to call up
| professionals. And in my experience, those professionals don't
| do a better job of giving me advice than Claude does.
|
| It's weekend, my wife is sick, the general practitioner is
| closed, the emergency weekend line has 35 people in the queue,
| and I want some quick half-assed medical guidance that while I
| know might not be 100% reliable, is still better than nothing
| for the next 2 hours? Feed all the symptoms and facts to
| Claude/ChatGPT and it does an okay job a lot of the time.
|
| I've been visiting Traditional Chinese Medicine (TCM)
| practitioner for a week now and my symptoms are indeed
| reducing. But TCM paradigm and concepts are so different from
| western medicine paradigms and concepts that I can't understand
| the doctor's explanation at all. Again, Claude does a
| reasonable job of explaining to me what's going on or why it
| works from a western medicine point of view.
|
| Want to write a novel? Brainstorm ideas with GPT-4o.
|
| I had a debate with a friend's child over the correct spelling
| of a Dutch word ("instabiel" vs "onstabiel"). Google results
| were not very clear. ChatGPT explained it clearly.
|
| Just where is this "useless" idea coming from? Do people not
| have a life outside of coding?
| krapp wrote:
| Yes people have lives outside of coding, but most people are
| able to manage without having AI software intercede in as
| much of their lives as possible.
|
| It seems like you trust AI more than people and prefer it to
| direct human interaction. That seems to be satisfying a need
| for you that most people don't have.
| claar wrote:
| Why do you postulate that "most people don't have" this
| need? I also use AI non-stop throughout my day for similar
| uses.
|
| This feels identical to when I was an early "smart phone"
| user w/my palm pilot. People would condescend saying they
| didn't understand why I was "on it all the time". A decade
| or two later, I'm the one trying to get others to put down
| their phones during meetings.
|
| My take? Those who aren't using AI continually currently
| are simply later adopters of AI. Give it a few years - or
| at most a decade - and the idea of NOT asking 100+ AI
| queries per day (or per hour) will seem positively quaint.
| krapp wrote:
| >Those who aren't using AI continually currently are
| simply later adopters of AI. Give it a few years - or at
| most a decade - and the idea of NOT asking 100+ AI
| queries per day (or per hour) will seem positively
| quaint.
|
| I don't think you're wrong, I just think a future in
| which it's all but physically and socially impossible to
| have a single thought or communication not mediated by
| software is fucking terrifying.
| FooBarWidget wrote:
| When I'm done working, chased my children to properly
| finish their dinner, helped my son with homework, and
| putting them to bed, it's already 9+ PM -- the only time of
| the day when I have free time. Just which human besides my
| wife can I talk to at that point? What if she doesn't have
| a clue either? All the professionals are only open when I'm
| working. A lot of the issues happen during the weekend,
| when professionals are closed. I don't want to disturb
| friends during the evening, and it's not like they have the
| expertise I need anyway.
|
| LLMs are infinitely patient, don't think I am dumb for
| asking certain things, consider all the information I feed
| them, are available whenever I need them, have a wide range
| of expertise, and are dirt cheap compared to professionals.
|
| That they might hallucinate is not a blocker most of the
| time. If the information I require is critical, I can
| always double check with my own research or with
| professionals (in which case the LLM has already primed me
| with a basic mental model so that I can ask quick, short,
| targeted questions, which saves the both of us time, and me
| money). For everything else (such as my curiocity on why
| TCM works, or the correct spelling of a word), LLMs are
| good enough.
| jiggawatts wrote:
| At the risk of sounding impolite or critical of your personal
| choices: this, right here, is the problem!
|
| You don't understand how medicine works, at any level.
|
| Yet you turn to a machine for advice, _and take it at face
| value_.
|
| I say these things confidently, because I _do_ understand
| medicine well enough to _not_ to seek my own answers.
| Recently I went to a doctor for a serious condition and
| _every_ notion I had was wrong. Provably wrong!
|
| I see the same behaviour in junior developers that simply
| copy-paste in whatever they see in StackOverflow or whatever
| they got out of ChatGPT with a terrible prompt, no context,
| and _no understanding_ on their part of the suitability of
| the answer.
|
| This is why I and many others still consider AIs mostly
| useless. The human in the loop is still _the_ critical
| element. Replace the human with someone that thinks that
| powdered rhino horn will give them erections, and the utility
| of the AI drops to near zero. Worse, it can _multiply_ bad
| tendencies and bad ideas.
|
| I'm sure someone somewhere is asking DeepSeek how best to get
| endangered animals parts on the black market.
| FooBarWidget wrote:
| No. Where do you read that I take it at face value? I
| literally said that I expect Claude to give me "half-assed"
| medical guidance. I merely said that that is still better
| than having _no_ clue for the next 2 hours while I wait on
| the phone with 35 people in front of me, which is
| _completely different_ from "taking medicine advice at
| face value".
|
| So I am curious about how TCM works. So what if an LLM
| hallucinates there? I am not writing papers on TCM or
| advising governments on TCM policy. I still follow the
| doctor's instructions at the end of the day.
|
| For anything really critical I already double check with
| professionals.
|
| You are letting perfect be the enemy of good. A half-assed
| tax advice with some hallucinations from an LLM is still
| useful, because it will prime me with some basic knowledge.
| When I later double check the whole thing with a
| professional, I will already know what questions to ask and
| what direction I need to explore, which saves time and
| money compared to goinf in with a blank slate.
|
| There is absolutely nothing wrong with using LLMs when you
| know their limits and how to mitigate them.
|
| So what if every notion you learned about medicine from
| LLMs is wrong? You learn why they're wrong, then next time
| you prompt/double check better, until you learn how to use
| it for that field in the least hallucinationatory way. Your
| experience also doesn't match mine: the advice I get
| usually contains useful elements that I discuss with
| doctors. Plus, doctors can make mistakes too.
|
| Stop letting perfect be the enemy of good.
| karmakaze wrote:
| We're at the "computers play chess badly" stage. Then we'll hit
| the Deep Thought (1988) and Deep Blue (1995-1997) stages, but
| still saying that solving Go won't happen for 50+ years and
| that humans will continue to be better than computers.
|
| The date/time that divides my world into before/after is
| AlphaGo v Lee Sedol game 3 (2016). From that time forward, I
| don't dismiss out of hand speculations of how soon we can have
| intelligent machines. Ray Kurzweil date of 2045 is as good as
| any (and better than most) for an estimate. Like Moore's (and
| related) Laws, it's not about _how_ but the historical pace of
| advancements crossing a fairly static point of human
| capability.
|
| Application coding, requires much less intelligence than
| playing Go at these high levels. The main differences are
| concise representation and clear final outcome scoring. LLMs
| deal quite well with the fuzziness of human communications.
| There may be a few more pegs to place but _when_ seems
| predictably unknown.
| kromem wrote:
| Both new Sonnet and Haiku have a masking overhead.
|
| Using a few messages to get them out of "I aim to be direct" AI
| assistant mode gets much better overall results for the rest of
| the chat.
|
| Haiku is actually incredibly good at high level systems
| thinking. Somehow when they moved to a smaller model the
| "human-like" parts fell away but the logical parts remained at
| a similar level.
|
| Like if you were taking meeting notes from a business strategy
| meeting and wanted insights, use Haiku over Sonnet, and thank
| me later.
| nntwozz wrote:
| I think John Gruber summed it up nicely:
|
| https://daringfireball.net/2024/12/openai_unimaginable
|
| OpenAI's board now stating "We once again need to raise more
| capital than we'd imagined" less than three months after raising
| another $6.6 billion at a valuation of $157 billion sounds
| alarmingly like a Ponzi scheme -- an argument akin to "Trust us,
| we can maintain our lead, and all it will take is a never-ending
| stream of infinite investment."
| hdjjhhvvhga wrote:
| What is funny is that their "lead" is just because of inertia -
| they were the first to make an LLM publicly available. But they
| are no longer leaders so their attempts at getting more and
| more money only prove Altman's skills at convincing people to
| give him money.
| jppope wrote:
| yeah but in business there are really only 2 skills right?
| Convincing people to give you money and giving them something
| back to them thats worth more than the money they gave you.
| klipt wrote:
| For repeated business you want to give them something that
| costs you less than what they pay, but is worth more to
| them than what they pay. Ie creating economic value.
| lumost wrote:
| They are still in the lead, and I'd be willing to bet that
| they have 10x the DAU on chat.com/chatgpt.com than all other
| providers combined. Barring massive innovation on small sub
| 10B models - we are all likely to need remote inference from
| large server farms for the foreseeable future. Even in the
| case that local inference is possible - it's unlikely it will
| be desirable from a power perspective in the next 3 years. I
| am not going to buy a 4xB200 instance for myself.
|
| Whether they offer the best model or not may not matter if
| you need a PhD in <subject> to differentiate the response
| quality between LLMs.
| scary-size wrote:
| Not sure about 10x DAUs. Google flicked the switch on
| Gemini and it surfaced in pretty much every GSuite app over
| night.
| Peacefulz wrote:
| Requiring that Gemini take over the job that Google
| Assistant did when installing the Gemini APK really
| rubbed me the wrong way. I get it. I just don't like that
| it was required for use.
| brokencode wrote:
| Same with Microsoft and all their Copilots, which are
| built on OpenAI. Not to mention all the other companies
| using OpenAI since it's still the best.
| theferalrobot wrote:
| Which models perform better than 4o or o1 for your use cases?
|
| In my limited tests (primarily code) nothing from llama or
| Gemini have come close, Claude I'm not so sure about.
| torginus wrote:
| How good is the best model of your choice at doing
| architecture work for complex and nontrivial apps?
|
| I have been bashing my head against the wall over the
| course of the past few days trying to create my (quite
| complex) dream app.
|
| Most of LLM coding I've done involved in writing code to
| interface with already existing libs or services and the
| LLMs are great at that.
|
| I'm hung up on architecture questions that are unique to my
| app and definitely not something you can google.
| fullstackchris wrote:
| Don't wanna be that typical hackernews guy but I couldnt
| resist... if your app is "quite complex" there is
| probably a way or ways you can break it down into much
| simpler parts. Easier for you AND the LLM. It always
| comes back to architecture and composition ;)
| torginus wrote:
| I don't want to be mean, but that bit of eastern wisdom
| you dispensed sounds incredibly like what a management
| consultant would say.
| belter wrote:
| Their best hope now is to hire John Carmack :-)
| jsheard wrote:
| According to the internal projections that The Information
| acquired recently they're expecting to lose $14 billion in
| 2026, so that record breaking funding round won't even buy them
| 6 months of runway at that point even by their own probably
| optimistic estimates.
| cactusfrog wrote:
| Every waste of money is not a Ponzi scheme.
| ffsm8 wrote:
| I agree, the core aspect of a ponzi scheme is that it
| redistributes the newly invested funds to previous investors,
| making it highly profitable to anyone joining early and
| incentivising early joiners to get new investors.
|
| This just doesn't hold true for open ai
| jacobgkau wrote:
| Doesn't it hold true for investment in AI (or potentially
| any other industry that experiences a boom) in general?
|
| Anyone who bought in at the ground floor is now rich.
| Anyone who buys in now is incentivized to try and keep
| getting more people to buy in so their investment will give
| a return regardless of if actual value is being created.
| dartos wrote:
| If effect, kind of.
|
| The money being invested does not go directly to
| investors.
|
| It goes to the cost of R&D, which in turn increases the
| value of openai shares, then the early investors can sell
| those shares to realize those gains.
|
| The difference between that and a ponzi is that the
| investment creates value which is reflected in the share
| price.
|
| No value is created in a Ponzi scheme.
|
| The actual dollar worth of the value generated is what
| people speculate on.
| zekica wrote:
| Only a part of the value is created in OpenAI's stock
| valuation. Most of it is still a ponzi-like scheme.
| dartos wrote:
| I have no love for openai, but they did make the fastest
| growing product of all time. There's value in being the
| ones to do that.
|
| I do agree it's a very very thin line.
| wslh wrote:
| Not every, but wasting money is one of the tricks of
| corruption.
| DavidSJ wrote:
| > Every waste of money is not a Ponzi scheme.
|
| Using this as an opportunity to grind an axe (not your fault,
| cactusfrog!): I find it clearer when people write "not every
| X is a Y" than "every X is not a Y", which could be (and
| would be, literally) interpreted to mean the same thing as
| "no X is a Y".
| sowbug wrote:
| Simon has mentioned in multiple articles how cool it is to use
| 64GB DRAM for GPU tasks on his MacBook. I agree it's cool, but I
| don't understand why it is remarkable. Is Apple doing something
| special with DRAM that other hardware manufacturers haven't
| figured out? Assuming data centers are hoovering up nearly all
| the world's RAM manufacturing capacity, how is Apple still
| managing to ship machines with DRAM that performs close enough
| for Simon's needs to VRAM? Is this just a temporary blip, and PC
| manufacturers in 2025 will be catching up and shipping mini PCs
| that have 64GB RAM ceilings with similar memory performance? What
| gives?
| post-it wrote:
| Apple designs its own chips, so the RAM and CPU are on the same
| die and can talk at very high speeds. This is not the case for
| PCs, where RAM is connected externally.
| com2kid wrote:
| Apple uses HBM, basically RAM on the same die as the CPU. It
| has a lot more memory bandwidth than typically PC dram, but
| still less than many GPUs. (Although the highest end macs have
| bandwidth that is in the same ballpark as GPUs)
| jsheard wrote:
| Apple does not use HBM, they use LPDDR. The way they use it
| is similar in principle to HBM (on-package, very wide bus)
| but it's not the same thing.
| karmakaze wrote:
| Right so Apple uses high-bandwidth memory, but not HBM.
| justincormack wrote:
| Its not HBM, which GPUs tend to use, but it is on package and
| wider interface than other PCs
| minimaxir wrote:
| LLMs run on the GPU, and the unified memory of Apple silicon
| means that the 64 GB can be used by the GPU.
|
| Consumer GPUs top out at 24 GB VRAM.
| karolist wrote:
| llama.cpp can run LLMs on CPU. iGPU can also use system
| memory, the novel thing is not that, it's that the LLM
| inference is mostly memory bandwidth bound and memory
| bandwidth of a custom built PC with really fast DDR5 RAM is
| around 100GB/s, nVidia consumer GPUs at the top end are
| around 1TB/s, with mid range GPUs at around half that. M1 Max
| has 400GB/s, M1 Ultra is 800GB/s, but you can have Apple
| Silicon Macs with up to 192GB of 800GB/s memory usable by
| GPU, this means much faster inference than just CPU+system
| memory due to bandwidth and more affordable than building a
| multi-GPU system to match the memory amount.
| dekhn wrote:
| It'd be really nice to have good memory bandwidth usage
| metrics collected from a wide range of devices while doing
| inference.
|
| For example, how close does it get to the peak, and what's
| the median bandwidth during inference? And is that
| bandwidth, rather than some other clever optimization
| elsewhere, actually providing the Mac's performance?
|
| Personally, I don't develop HPC stuff on a laptop - I am
| much more interested in what a modern PC with Intel or AMD
| and nvidia can do, when maxxed out. But it's certainly
| interesting to see that some of Apple's arch decisions have
| worked out well for local LLMs.
| viccis wrote:
| I didn't realize "agent" designs were that ambiguously defined.
| Every AI engineer I've talked to uses it to mean a design that
| combines several separate LLM prompts (or even models) to solve
| problems in multiple stages.
| simonw wrote:
| I'll add that one to the list. Surprisingly it doesn't closely
| match most of the 211 definitions I've collected already!
|
| The closest in that collection is "A division of
| responsibilities between LLMs that results in some sort of
| flow?" -
| https://lite.datasette.io/?json=https://gist.github.com/simo...
| xnx wrote:
| This sounds like ensemble chain of thought.
| datadrivenangel wrote:
| If the investors ask, those same AI engineers will probably
| allow the answer to be much more ambiguous.
| submeta wrote:
| Thank you Simon for the excellent work you do! I learned a lot
| from you and enjoy reading everything you write. Keep up. And
| happy new year.
| xnx wrote:
| Double checking, I don't think I saw anything about video
| generation. Not sure if those fall under the "LLM" umbrella. It
| came very late in the year, but the Google Veo 2 limited testing
| are astounding. There are at least a half-dozen other services
| where you can pay to generate video.
| baobabKoodaa wrote:
| Video generation was covered in OP
| xnx wrote:
| I've been surprised that ChatGPT has hung on as long as it has.
| Maybe 2025 is the year Microsoft pushes harder for their brand of
| LLM.
| switch007 wrote:
| I've watched juniors take their output as gospel applying
| absolutely zero thinking and getting confused when I suggest
| looking at the reference manual instead
|
| I've had PMs believe it can replace all writing of tickets and
| thinking about the feature, creating completely incomprehensible
| descriptions and acceptance criteria
|
| I've had Slack messages and emails from people with zero
| sincerity and classic LLM style and the bs that entails
|
| I've had them totally confidently reply with absolute nonsense
| about many technical topics
|
| I'm grouchy and already over LLMs
| dartos wrote:
| > There's a flipside to this too: a lot of better informed people
| have sworn off LLMs entirely because they can't see how anyone
| could benefit from a tool with so many flaws. The key skill in
| getting the most out of LLMs is learning to work with tech that
| is both inherently unreliable and incredibly powerful at the same
| time. This is a decidedly non-obvious skill to acquire!
|
| I wish the author qualified this more. How does one develop that
| skill?
|
| What makes LLMs so powerful on a day to day basis without a large
| RAG system around it?
|
| Personally, I try LLMs every now and then, but haven't seen any
| indication of their usefulness for my day to day outside of being
| a smarter auto complete.
| lumost wrote:
| When I started my career in 2010, google was a semi-serious
| skill. All of the little things that we know how to do now such
| as ignoring certain sites, lingering on others, and iteratively
| refining our search queries were not universally known at the
| time. Experienced engineers often relied on encyclopedic
| knowledge of their environment or by "reading the manual".
|
| In my experience, LLM tools are the same, you ask for something
| basic initially and then iteratively refine the query either
| via dialog or a new prompt until you get what you are looking
| for or hit the end of the LLM's capability. Knowing when you've
| reached the latter is critically important.
| o11c wrote:
| The problems with that skill is that:
|
| * Most existing LLM interfaces are very bad at _editing_
| history, instead focusing entirely on _appending_ to history.
| You can sort of ignore this for one-shot, and this can be
| properly fixed with additional custom tools, but ...
|
| * By the time you refine your input enough to patch over all
| the errors in the LLM's output for your sensible input,
| you're bigger than the LLM can actually handle (much smaller
| than the alleged context window), so it starts randomly
| ignoring significant chunks of what you wrote (unlike
| context-window problems, the ignored parts can be _anywhere_
| in the input).
| simonw wrote:
| Yeah, a key thing to understand about LLMs is that managing
| the context is _everything_. You need to know when to wipe
| the slate by starting a new chat session and then pasting
| across a subset of the previous conversation.
|
| A lot of my most complex LLM interactions take place across
| multiple sessions - and in some cases I'll even move the
| project from Claude 3.5 Sonnet to OpenAI o1 (or vice versa)
| to help get out of a rut.
|
| It's infuriatingly difficult to explain why I decide to do
| that though!
| grimgrin wrote:
| I bought in early to typingmind, a great web based
| frontend. Good for editing context, and switching from
| say gemini to claude. This is a very normal flow for me,
| and whatever tool you use should enable this
|
| also nice to interact with an LLM in vim, as the context
| is the buffer
|
| obviously simon's llm tool rules. I've wrapped it for vim
| dartos wrote:
| What kinds of things do you with these LLMs?
|
| I feel like I'm good at understanding context. I've been
| working in AI startups over the last 2 years. Currently
| at an AI search startup.
|
| Managing context for info retrieval is the name of the
| game.
|
| But for my personal use as a developer, they've caused me
| much headache.
|
| Answers that are subtly wrong in such a way that it took
| me a week to realize my initial assumption based on the
| LLM response was totally bunk.
|
| This happened twice. With the yjs library, it gave me
| half incorrect information that led me to misimplementing
| the sync protocol. Granted it's a fairly new library.
|
| And again with the web history api. It said that the
| history stack only exists until a page reload. The
| examples it gave me ran as it described, but that isn't
| how the history api works.
|
| I lost a week of time because of that assumption.
|
| I've been hesitant to dive back in since then. I ask
| questions every now and again, but I jump off much faster
| now if I even think it may be wrong.
| perrygeo wrote:
| There's a similar dynamic in building reliable distributed
| systems on top of an unreliable network. The parts are prone to
| failure but the system can keep on working.
|
| The tricky problem with LLMs is identifying failures - if
| you're asking the question, it's implied that you don't have
| enough context to assess whether it's a hallucination or a good
| recommendation! One approach is to build ensembles of agents
| that can check each other's work, but that's a resource-
| intensive solution.
| volgminar wrote:
| The quote you pulled reeks of desperation.
|
| If you have a product then sell me on it, because you'll only
| sound desperate trying to convince me to use your product by
| telling me why I have an aversion to it.
|
| Does simonw actually like probabilistic computing? Does simonw
| eagerly follow Dave Ackley's T2Tile project? [
| https://t2tile.com ]
|
| Or is simonw just using "unreliable" computing in an attempt at
| a holier than thou framing to talk, yet again, about a subset
| of a subset of machine learning research?
| simonw wrote:
| I hadn't heard of T2Tile. The intro video
| https://www.youtube.com/watch?v=jreRFxN6wuM is from 5 years
| ago so it predates even GPT-3.
|
| Do you know if any of the ideas from that project have
| crossed over into LLM world yet?
| duck wrote:
| Do you know who Simon is?
| simonw wrote:
| One of the things I find most frustrating about LLMs is how
| resistant they are to teaching other people how to use them!
|
| I'd love to figure this out. I've written more about them than
| most people at this point, and my goal has always been to help
| people learn what they can and cannot do - but distilling that
| down to a concise set of lessons continues to defeat me.
|
| The only way to really get to grips with them is to use them, a
| lot. You need to try things that fail, and other things that
| work, and build up an intuition about their strengths and
| weaknesses.
|
| The problem with intuition is it's really hard to download that
| into someone else's head.
|
| I share a _ton_ of chat conversations to show how I use them -
| https://simonwillison.net/tags/tools/ and
| https://simonwillison.net/tags/ai-assisted-programming/ have a
| bunch of links to my exported Claude transcripts.
| bjt wrote:
| Thank you for doing this work, though.
|
| My first stab at trying ChatGPT last year was asking it to
| write some Rust code to do audio processing. It was not a
| happy experience. I stepped back and didn't play with LLMs at
| all for a while after that. Reading your posts has helped me
| keep tabs on the state of the art and decide to jump back in
| (though with different/easier problems this time).
| swalsh wrote:
| It's amazing this is still an opinion in 2025. I now ask devs
| how they use AI as part of their workflows when I interview.
| It's a standard skill I expect my guys to have.
| BeetleB wrote:
| Just curious, but what AI related skills do you expect them
| to have?
| dartos wrote:
| I feel bad for your team.
|
| Let people work how they want. I wouldn't not hire someone on
| the basis of them not using a language server.
|
| The creator of the Odin language famously doesn't use one.
| He's says that he, specifically, is faster without one.
| BeetleB wrote:
| I think most tech folks struggle with it because they treat
| LLMs as computer programs, and their experience is that SW
| should be extremely reliable - imagine using a calculator that
| was wrong 5% of the time - no one would accept that!
|
| Instead, think of an LLM as the equivalent of giving a human a
| menial task. You _know_ that they 're not 100% reliable, and so
| you give them only tasks that you can quickly verify and
| correct.
|
| Abstract that out a bit further, and realize that most managers
| don't expect their reports to be 100% reliable.
|
| Don't use LLMs where accuracy is paramount. Use it to automate
| away tedious stuff. Examples for me:
|
| Cleaning up speech recognition. I use a traditional voice
| recognition tool to transcribe, and then have GPT clean it up.
| I've tried voice recognition tools for dictation on and off for
| over a decade, and always gave up because even a 95% accuracy
| is a pain to clean up. But now, I route the output to GPT
| automatically. It still has issues, but I now often go
| paragraphs before I have to correct anything. For personal
| notes, I mostly don't even bother checking its accuracy - I do
| it only when dictating things others will look at.
|
| And then add embellishments to that. I was dictating out a
| recipe I needed to send to someone. I told GPT up front to
| write any number that appears next to an ingredient as a
| numeral (i.e. 3 instead of "three"). Did a great job - didn't
| need to correct anything.
|
| And then there are always the "I could do this myself but I
| didn't have time so I gave it to GPT" category. I was giving a
| presentation that involved graphs (nodes, edges, etc). I was on
| a tight deadline and didn't want to figure out how to draw
| graphs. So I made a tabular representation of my graph, gave it
| to GPT, and asked it to write graphviz code to make that graph.
| It did it perfectly (correct nodes and edges, too!)
|
| Sure, if I had time, I'd go learn graphviz myself. But I
| wouldn't have. The chances I'll need graphviz again in the next
| few years is virtually 0.
|
| I've actually used LLMs to do quick reformatting of data a few
| times. You just have to be careful that you can verify the
| output _quickly_. If it 's a long table, then don't use LLMs
| for this.
|
| Another example: I have a custom note taking tool. It's just
| for me. For convenience, I also made an HTML export. Wouldn't
| it be great if it automatically made alt text for each image I
| have in my notes? I would just need to send it to the LLM and
| get the text. It's fractions of a cent per image! The current
| services are a lot more accurate at image recognition than I
| need them to be for this purpose!
|
| Oh, and then of course, having it write Bash scripts and CSS
| for me :-) (not a frontend developer - I've learned CSS in the
| past, but it's quicker to verify whatever it throws at me than
| Google it).
|
| Any time you have a task and lament "Oh, this is likely easy,
| but I just don't have the time" consider how you could make an
| LLM do it.
| alexashka wrote:
| I wonder what the author of this post thinks of human generated
| slop.
|
| For example if someone just takes random information about a
| topic, organizes it in chronological order and adds empty
| opinions and preferences to it and does that for years on end -
| what do you call that?
| dash2 wrote:
| Look, when are these models going to not just talk to me, but do
| stuff for me? If they're so clever, why can't I tell one to buy
| chocolates and send them to my wife? Meanwhile, they can
| allegedly solve frontier maths problems. What's the holdup to
| models that go online and perform simple tasks?
| icelancer wrote:
| The last mile problem remains undefeated.
| gs17 wrote:
| > why can't I tell one to buy chocolates and send them to my
| wife?
|
| I'm pretty sure that's been possible for a while. There was an
| example where Claude's computer use feature ordered pizza for
| the dev team through DoorDash:
| https://x.com/alexalbert__/status/1848777260503077146?lang=e...
|
| I don't think the released version of the feature can do it,
| but it should be possible with today's tech.
| th0ma5 wrote:
| More dishonest magical thinking. I wish this guy would learn how
| systems work and stop flooding the field with mystical nonsense
| unless he really is trying to make people think LLMs are
| worthless, then I guess he should be honest about it instead of
| subversive.
| simonw wrote:
| Which bit was dishonest magical thinking?
|
| In case you're interested, here's a summarized list (thanks,
| Claude) of the negative/critical things I said about LLMs and
| the companies that build them in this post:
| https://gist.github.com/simonw/73f47184879de4c39469fe38dbf35...
| jdlshore wrote:
| I read the article and thought it was well done and level-
| headed. What exactly did you think was mystical or magical
| thinking?
| orsenthil wrote:
| One of the best written summary of LLMs for the year 2024.
|
| We all have silently started to realize Slops, hopefully we can
| recognize them more easily and prevent them.
|
| Test Driven Development (Integration Tests or functional tests
| specifically) for Prompt Driven Development seems like the way to
| go.
|
| Thank you, Simon.
| legendofbrando wrote:
| @simonw you've been awesome all year; loved this recap and look
| forward to more next year
| Havoc wrote:
| Great summary of highlights. Don't agree with all, but I think
| it's a very sound attempt at a year in review summary
|
| >LLM prices crashed
|
| This one has me a little spooked. The white knight on this front
| (DS) has both announced increases and has had staff poached.
| There is still Gemini free tier which is ofc basically impossible
| to beat (solid & functionally unlimited/free) but it's google so
| reluctant to trust.
|
| Seriously worried about seeing a regression on pricing in first
| half of 2025. Especially with the OAI $200 price anchoring.
|
| >"Agents" still haven't really happened yet
|
| Think that's largely because it's a poorly defined concept and
| true "agent" implies some sort of pseudo-agi autonomy. This is a
| definition/expectation issue rather than technical in my mind
|
| >LLMs somehow got even harder to use
|
| I don't think that's 100%. An explosion of options is not equal
| to harder to use. And the guidance for noobs is still pretty much
| same as always (llama.cp or one of the common frontends like
| text-generation-webui). It's become harder to tell what is good,
| but not to get going.
|
| ----
|
| One key theme I think is missing is just how hard it has become
| to tell what is "good" for the average user. There is so much
| benchmark shenanigans going on that it's just impossible to tell.
| I'm literally at the "I'm just going to build my own testing
| framework" stage. Not because I can do better technically (I
| can't)...but because I can gear it towards things I care about
| and I can be confident my DIY sample hasn't been gamed.
| simonw wrote:
| The biggest reason I'm not worried about prices going back up
| again is Llama. The Llama 3 models are _really good_ , and
| because they are open weight there are a growing number of API
| providers competing to provide access to them.
|
| These companies are incentivized to figure out fast and
| efficient hosting for the models. They don't need to train any
| models themselves, their value is added entirely in continuing
| to drive the price of inference down.
|
| Groq and Cerberus are particularly interesting here because WOW
| they serve Llama fast.
| macawfish wrote:
| Large concept models are really exciting
| voidhorse wrote:
| Great write up! Unfortunately, I think this article accurately
| reflects how we've made little progress on the most important
| aspects of LLM hype and use: the social ones.
|
| A small number of people with lots of power are essentially
| deciding to go all in on this technology presumably because
| significant gains will mean the long term reduction of human
| labor needs, and thus human labor power. As the article mentions,
| this also comes at huge expenditure and environmental impact,
| which is already a very important domain in crisis that we've
| neglected. The whole thing especially becomes laughable when you
| consider that many people are still using these tools to perform
| tasks that could be preformed with a margin of more effort using
| existing deterministic tools. Instead we are now opting for a
| computationally more expensive solution that has a higher margin
| of error.
|
| I get that making technical progress in this area is interesting,
| but I really think the lower level workers and researchers
| exploring the space need to be more emphatic about thinking about
| socioeconomic impact. Some will argue that this is analogous to
| any other technological change and markets will adjust to account
| for new tool use, but I am not so sure about this one. If the
| technology is really as groundbreaking as everyone wants us to
| believe then logically we might be facing a situation that isn't
| as easy to adapt to, and I guarantee those with power will not
| "give a little back" to the disenfranchised masses out of the
| goodness of their hearts.
|
| This doesn't even raise all the problems these tools create when
| it comes to establishing coherent viewpoints and truth in
| ostensibly democratic societies, which is another massive can of
| worms.
| bwhiting2356 wrote:
| Some amount of LLM gullibility may be needed. Let's say I have a
| RAG use case for internal documents about how my business works.
| I need the LLM to accept what I'm telling it about my business as
| the truth without questioning it. If I got responses like "this
| return policy is not correct", LLMs would fail at my use case.
| calebm wrote:
| I love your breadth-first approach of having an outline at the
| top.
| simonw wrote:
| I wrote custom software for that!
| https://tools.simonwillison.net/render-markdown - If you paste
| in some Markdown with ## section headings in it the output will
| start with a <ul> list of links to those headings.
| nektro wrote:
| i learned this industry has lower morals and standards for
| excellence than i ever previously expected
___________________________________________________________________
(page generated 2024-12-31 23:00 UTC)