[HN Gopher] Large language models, small labor market effects [pdf]
___________________________________________________________________
Large language models, small labor market effects [pdf]
Author : luu
Score : 80 points
Date : 2025-04-25 08:15 UTC (14 hours ago)
(HTM) web link (bfi.uchicago.edu)
(TXT) w3m dump (bfi.uchicago.edu)
| mediaman wrote:
| Great read. One of the interesting insights from it is how
| difficult good application of AI is.
|
| A lot of companies are just "deploying a chatbot" and some of the
| results from this study show that this doesn't work very well. My
| experience is similar: deploying simple chatbots to the
| enterprise doesn't do a lot.
|
| For things to get better, two things are required, neither of
| which are easy:
|
| - Integration into existing systems. You have to build data lakes
| or similar system that allow the AI to use data and information
| broadly across an enterprise. For example, for an AI tool to be
| useful in accounting, it's going to need high quality data access
| to the company's POs, issued invoices, receivers, GL data, vendor
| invoices, and so on. But many systems are old, have dodgy or
| nonexistent APIs, and data is held in various bureaucratic
| fiefdoms. This work is hard and doesn't scale that well.
|
| - Knowledge of specific workflows. It's better when these tools
| are built with specific workflows in mind that are designed
| around specific peoples' jobs. This can start looking less like
| pure AI and more like a mix of traditional software with some AI
| capabilities. My experience is that I sell software as "AI
| solutions," but often I feel a lot of the value created is
| because it's replacing bad processes (either terrible older
| software, or attempting to do collaborative work via
| spreadsheet), and the AI tastefully sprinkled throughout may not
| be the primary value driver.
|
| Knowledge of specific workflows also requires really good product
| design. High empathy, ability to understand what's not being
| said, ability to understand how to create an overall process
| value stream from many different peoples' narrower viewpoints,
| etc. This is also hard.
|
| Moreover, this is deceiving because for some types of work
| (coding, ideating around marketing copy) you really don't need
| that much scaffolding at all because the capabilities are latent
| in the AI, and layering stuff on top mostly gets in the way.
|
| My experience is that this type of work is a narrow slice of the
| total amount of work to be done, though, which is why I'd agree
| with the overall direction this study is suggesting that creating
| actual measurable major economic value with AI is going to be a
| long-term slog, and that we'll probably gradually stop calling it
| AI in the process as we attenuate to it and it starts being used
| as a tool within software processes.
| ladeez wrote:
| The pivot to cloud had a decade warmup before HOWTO was
| normalized to existing standards.
|
| In the lead up a lot of the same naysaying we see about AI was
| everywhere. AI can be compressed into less logic on a chip,
| bootstrap from models. Require less state management tooling
| software dev relies on now. We're slowly being trained to
| accept a down turn in software jobs. No need to generate the
| code that makes up an electrical state when we can just tune
| hardware to the state from an abstract model deterministically.
| Energy based models are the futuuuuuure.
|
| https://www.chipstrat.com/p/jensen-were-with-you-but-were-no...
|
| Lot of the same naysaying about Dungeons and Dragons and comic
| books in the past too. Life carried on.
|
| Functional illiterates fetishize semantics, come to view their
| special literacy as key to the future of humanity. Tale as old
| as time.
| aerhardt wrote:
| > how difficult good application of AI is.
|
| The only interesting application I've identified thus far in my
| domain in Enterprise IT (I don't do consumer-facing stuff like
| chatbots) is in replacing tasks that previously would've been
| done by NLP: mainly extraction, synthesis, classification. I am
| currently working a long-neglected dataset that needs a massive
| remodel and I think that would've taken a lot of manual
| intervention and a mix of different NLP models to whip into
| shape in the past, but with LLMs we might be able to pull it
| off with far fewer resources.
|
| Mind you at the scale of the customer I am currently working
| with, this task also would've never been done in the first
| place - so it's not replacing anyone.
|
| > This can start looking less like pure AI and more like a mix
| of traditional software with some AI capabilities
|
| Yes, the other use case I'm seeing is in peppering already
| existing workflow integrations with a bit of LLM magic here and
| there. But why would I re-work a worklfow that's already
| implemented and well-understood in Zapier, n8n or Python with
| total reliability.
|
| > Knowledge of specific workflows also requires really good
| product design. High empathy, ability to understand what's not
| being said, ability to understand how to create an overall
| process value stream from many different peoples' narrower
| viewpoints, etc. This is also hard.
|
| > My experience is that this type of work is a narrow slice of
| the total amount of work to be done
|
| Reading you I get the sense we are on the same page on a lot of
| thing and I am pretty sure if we worked together we'd get along
| fine. I'm struggling a bit with the LLM delulus as of late so
| it's a breath of fresh air to read people out there who get it.
| PaulHoule wrote:
| As I see it three letter organizations have been using
| frameworks like Apache UIMA to build information extraction
| pipelines that are manual at worst and hybrid at best. Before
| BERT the models we had for this sucked, only useful for
| certain things, and usually requiring training sets of 20,000
| or so examples.
|
| Today the range of things for which the models are tolerable
| to "great" has greatly expanded. In arXiv papers you tend to
| see people getting tepid results with 500 examples, I get
| better results with 5000 examples and diminishing returns
| past 15k.
|
| For a lot of people it begins and ends with "prompt
| engineering" of commercial decoder models and evaluation
| isn't even an afterthought For information extraction,
| classification and such though you get often good results
| with encoder models (e.g. BERT) put together with serious
| eval, calibration and model selection. Still the system looks
| like the old systems if your problem is hard and has to be
| done in a scalable way, but sometimes you can make something
| that "just works" without trying too hard, keeping your
| train/eval data in a spreadsheet.
| AlexCoventry wrote:
| I think when the costs and latencies of reasoning models like
| o1-pro, o3 and o4-mini-high come down, chatbots are going to be
| very more effective for technical support. They're quite
| reliable and knowledgeable, in my experience.
| treis wrote:
| LLM chatbots are a step forward for customer support. Well,
| ours started hallucinating a support phone number that while is
| a real number is not our number. Lots of people started calling
| which was a bad time for everyone. Especially the person's
| number it actually is. So maybe two steps forward and
| occasionally one back.
| stego-tech wrote:
| > Integration into existing systems.
|
| Integration alone isn't enough. Organizations let their data go
| stale, because keeping it updated is a political task instead
| of a technical one. Feeding an AI stale data effectively
| renders it useless, because it doesn't have the presence of
| mind to ask for assistance when it encounters an issue, or to
| ask colleagues if this process is still correct even though the
| expected data doesn't "fit".
|
| Automations - including AI - _require_ clean, up-to-date data
| in order to function effectively. Orgs who slap in a chatbot
| and call it a day don 't understand the assignment.
| jaxtracks wrote:
| Interesting study! Far too early in the adoption lifecycle for
| any conclusions I think, especially given that the data is from
| Denmark which tends to be have a far less hype-driven business
| culture than the US going by my bit of experience working in
| both. Anecdotally, I've seen a couple of AI hiring freezes in the
| states (some from LLM integrations I've built) that I'm fairly
| sure will be reversed when management gets a more realistic sense
| of capabilities, and my general sense is that the Danes I've
| worked with would be far less likely to overestimate the value of
| these tools.
| sottol wrote:
| I agree on the "far too early" part. But imo we can probably
| say more about the impact in a year though, not 5-10 years. But
| it does show that some of the randomized-controlled-trials that
| showed large labor-force impact and productivity gains are
| probably only applicable to a small sub-section of the work-
| force.
|
| It also looks like the second survey was sent out in June 2024
| - so the data is 10 months old at this point, another reason
| why this it might be early.
|
| That said, the latest round of models are the first I've
| started using more extensively.
|
| The paper does address the fact that Denmark is not the US, but
| supposedly not that different:
|
| "First, Danish workers have been at the forefront of Generative
| AI adoption, with take-up rates comparable to those in the
| United States (Bick, Blandin and Deming, 2025; Humlum and
| Vestergaard, 2025; RISJ, 2024).
|
| Second, Denmark's labor market is highly flexible, with low
| hiring and firing costs and decentralized wage bargaining--
| similar to that of the U.S.--which allows firms and workers to
| adjust hours and earnings in response to technological change
| (Botero et al., 2004; Dahl, Le Maire and Munch, 2013). In
| particular, most workers in our sample engage in annual
| negotiations with their employers, providing regular
| opportunities to adjust earnings and hours in response to AI
| chatbot adoption during the study period."
| meta_ai_x wrote:
| It's incredibly hard to model complex non-linear systems. So,
| while I applaud the researchers to provide some data points,
| these things provide ZERO value for current/future decision
| making.
|
| Chatbots were absolute garbage before chatGPT, while post chatGPT
| everything changed. So, there is going to be a tipping point
| event on labor market effects and past single variable "data
| analysis" will not provide anything to predict the event or it's
| effects
| Legend2440 wrote:
| Seems premature, like measuring the economic impact of the
| internet in 1985.
|
| LLMs are more tech demo than product right now, and it could take
| many years for their full impact to become apparent.
| amarcheschi wrote:
| I wouldn't call "premature" when llm companies ceos have been
| proposing ai agents for replacing workers - and similar things
| that I find debatable - in about the 2nd half of the twenties.
| I mean, a cold shower might eventually happen for a lot of Ai
| based companies
| frankfrank13 wrote:
| > cold shower might eventually happen for a lot of Ai based
| companies
|
| undoubtedly.
|
| The economic impact of some _actually_ useful tools (Cursor,
| Claude) are propping up hundreds of billions of dollars in
| funding for, idk, "AI for <pick an industry> "or "replace
| your <job title> with our AI tool"
| dehrmann wrote:
| The most recent example is the Anthropic CEO:
|
| > I think we will be there in three to six months, where AI
| is writing 90% of the code. And then, in 12 months, we may be
| in a world where AI is writing essentially all of the code
|
| https://www.businessinsider.com/anthropic-ceo-
| ai-90-percent-...
|
| This seems either wildly optimistic or comes with a giant
| asterisk that AI will write it by token predicting, then a
| human will have to double check and refine it.
| amarcheschi wrote:
| I'm honestly slightly appalled by what we might miss by not
| reading the docs and just letting Ai code. I'm attending a
| course where we have to analyze medical datasets using up
| to ~200gb of ram. Calculations can take some time. A simple
| skim through the library (or even asking the chatbot) can
| tell you that one of the longest call can be approximated
| and it takes about 1/3rd of the time it takes with another
| solver. And yet, none of my colleagues thought about either
| looking the docs or asking the chatbot. Because it was
| working. And of course the chatbot was using the solver
| that was "standard" but that you probably don't need to use
| for prototyping.
|
| Again. We had some parts of one of 3 datasets split in ~40
| files, and we had to manipulate and save them before doing
| anything else. A colleague asked chatgpt to write the code
| to do it and it was single threaded, and not feasible. I
| hopped up on htop and upon seeing it was using only one
| core, I suggested her to ask chatgpt to make the conversion
| run on different files in different threads, and we
| basically went from absolutely slow to quite fast. But that
| supposed that the person using the code knows what's going
| on, why, and what is not going on. And when it is possible
| to do something different. Using it without asking yourself
| more about the context is a terrible use imho, but it's
| absolutely the direction that I see we're headed towards
| and I'm not a fan of it
| agumonkey wrote:
| I anticipate a control issue, where agents can produce code
| faster than people can analyze and beside applications with
| small visible surfaces, nobody will be able to check what
| is going on
|
| I saw people with trouble manipulating boolean tables of 3
| variables in their head trying to generate complete web
| applications, it will work for linear duties (input ->
| processing -> storage) but I highly doubt they will be able
| to understand anything with 2nd order effects
| layoric wrote:
| A big difference here is the sheer scale of investment. In
| 1985, the internet was running on the dreams of a few. The
| sheer depth of investment in "AI" currently is hard to fathom,
| and being injected into everything regardless of what customers
| want.
| trod1234 wrote:
| We seriously live in the world of Anathem now where apparently
| most people need a specialized expert to cut through plausible
| generated misinformation as a whole.
|
| This is a second similar study I've seen today on HN that seems
| in part generated by AI, and fails rigorous methodology, while
| making conclusions that are unbased to seemingly fuel a
| narrative.
|
| The study fails to account for a number of elements which nullify
| the conclusions as a whole.
|
| AI Chatbot tasks by their nature are communication tasks
| involving a third-party (the customer). When the Chatbot fails to
| direct, or loops coercively, and this is a task computer's really
| can't do well; customers get enraged because it results in crazy-
| making/inducing behavior. The Chatbot in such cases imposes time-
| cost, with all the necessary elements suitable to call it
| torture. Those elements being isolation, cognitive dissonance,
| coercion with perceived/real loss, lack of agency. There is
| little if any differentiation between the tasks measured.
| Emotions Kill [1].
|
| This results in outcomes where there is no change, or higher
| demand for workers, just to calm that person down and this is
| true regardless of occupation. In other words the punching bag of
| verbal hostility, which is the role of CSR receiving calls or
| communications from irrationally enraged customers after AI has
| had their first chance to wind them up.
|
| It is a stochastic environment, and very few conclusions can
| actually be supported because they seem to follow reasoning along
| a null hypothesis.
|
| The surveys use Denmark as an example (being part of the EU), but
| its unclear if they properly take into account company policies
| about not submitting certain private data for tasks to a US-based
| LLM given the risks related to GDPR. They say the surveys were
| sent to workers directly who are already employed, but it makes
| no measure of displaced workers, nor overall job reductions,
| which historically is how the changes in integration are adopted,
| misleading the non-domain expert reader.
|
| The paper does not appear to be sound, and given it relies solely
| on a DiD approach without specifying alternatives, it may be
| pushing a pre-fabricated narrative that AI won't disrupt the
| workforce when the study doesn't actually support that in any
| meaningful rational way.
|
| This isn't how you do good science. Overgeneralizing is a
| fallacy, and while some computation is being done to limit that
| it doesn't touch on what you don't know, because what you don't
| know hasn't been quantified (i.e. the streetlight effect)[1].
|
| To understand this, the layman and expert alike must always pay
| attention to what you don't know. The video below touches on
| _some_ of the issues without requiring technical expertise. [1]
|
| [1][Talk] Survival Heuristics: My Favorite Techniques for
| Avoiding Intelligence Traps - SANS CTI Summit 2018
|
| https://www.youtube.com/watch?v=kNv2PlqmsAc
___________________________________________________________________
(page generated 2025-04-25 23:00 UTC)