[HN Gopher] Large language models, small labor market effects [pdf]
       ___________________________________________________________________
        
       Large language models, small labor market effects [pdf]
        
       Author : luu
       Score  : 80 points
       Date   : 2025-04-25 08:15 UTC (14 hours ago)
        
 (HTM) web link (bfi.uchicago.edu)
 (TXT) w3m dump (bfi.uchicago.edu)
        
       | mediaman wrote:
       | Great read. One of the interesting insights from it is how
       | difficult good application of AI is.
       | 
       | A lot of companies are just "deploying a chatbot" and some of the
       | results from this study show that this doesn't work very well. My
       | experience is similar: deploying simple chatbots to the
       | enterprise doesn't do a lot.
       | 
       | For things to get better, two things are required, neither of
       | which are easy:
       | 
       | - Integration into existing systems. You have to build data lakes
       | or similar system that allow the AI to use data and information
       | broadly across an enterprise. For example, for an AI tool to be
       | useful in accounting, it's going to need high quality data access
       | to the company's POs, issued invoices, receivers, GL data, vendor
       | invoices, and so on. But many systems are old, have dodgy or
       | nonexistent APIs, and data is held in various bureaucratic
       | fiefdoms. This work is hard and doesn't scale that well.
       | 
       | - Knowledge of specific workflows. It's better when these tools
       | are built with specific workflows in mind that are designed
       | around specific peoples' jobs. This can start looking less like
       | pure AI and more like a mix of traditional software with some AI
       | capabilities. My experience is that I sell software as "AI
       | solutions," but often I feel a lot of the value created is
       | because it's replacing bad processes (either terrible older
       | software, or attempting to do collaborative work via
       | spreadsheet), and the AI tastefully sprinkled throughout may not
       | be the primary value driver.
       | 
       | Knowledge of specific workflows also requires really good product
       | design. High empathy, ability to understand what's not being
       | said, ability to understand how to create an overall process
       | value stream from many different peoples' narrower viewpoints,
       | etc. This is also hard.
       | 
       | Moreover, this is deceiving because for some types of work
       | (coding, ideating around marketing copy) you really don't need
       | that much scaffolding at all because the capabilities are latent
       | in the AI, and layering stuff on top mostly gets in the way.
       | 
       | My experience is that this type of work is a narrow slice of the
       | total amount of work to be done, though, which is why I'd agree
       | with the overall direction this study is suggesting that creating
       | actual measurable major economic value with AI is going to be a
       | long-term slog, and that we'll probably gradually stop calling it
       | AI in the process as we attenuate to it and it starts being used
       | as a tool within software processes.
        
         | ladeez wrote:
         | The pivot to cloud had a decade warmup before HOWTO was
         | normalized to existing standards.
         | 
         | In the lead up a lot of the same naysaying we see about AI was
         | everywhere. AI can be compressed into less logic on a chip,
         | bootstrap from models. Require less state management tooling
         | software dev relies on now. We're slowly being trained to
         | accept a down turn in software jobs. No need to generate the
         | code that makes up an electrical state when we can just tune
         | hardware to the state from an abstract model deterministically.
         | Energy based models are the futuuuuuure.
         | 
         | https://www.chipstrat.com/p/jensen-were-with-you-but-were-no...
         | 
         | Lot of the same naysaying about Dungeons and Dragons and comic
         | books in the past too. Life carried on.
         | 
         | Functional illiterates fetishize semantics, come to view their
         | special literacy as key to the future of humanity. Tale as old
         | as time.
        
         | aerhardt wrote:
         | > how difficult good application of AI is.
         | 
         | The only interesting application I've identified thus far in my
         | domain in Enterprise IT (I don't do consumer-facing stuff like
         | chatbots) is in replacing tasks that previously would've been
         | done by NLP: mainly extraction, synthesis, classification. I am
         | currently working a long-neglected dataset that needs a massive
         | remodel and I think that would've taken a lot of manual
         | intervention and a mix of different NLP models to whip into
         | shape in the past, but with LLMs we might be able to pull it
         | off with far fewer resources.
         | 
         | Mind you at the scale of the customer I am currently working
         | with, this task also would've never been done in the first
         | place - so it's not replacing anyone.
         | 
         | > This can start looking less like pure AI and more like a mix
         | of traditional software with some AI capabilities
         | 
         | Yes, the other use case I'm seeing is in peppering already
         | existing workflow integrations with a bit of LLM magic here and
         | there. But why would I re-work a worklfow that's already
         | implemented and well-understood in Zapier, n8n or Python with
         | total reliability.
         | 
         | > Knowledge of specific workflows also requires really good
         | product design. High empathy, ability to understand what's not
         | being said, ability to understand how to create an overall
         | process value stream from many different peoples' narrower
         | viewpoints, etc. This is also hard.
         | 
         | > My experience is that this type of work is a narrow slice of
         | the total amount of work to be done
         | 
         | Reading you I get the sense we are on the same page on a lot of
         | thing and I am pretty sure if we worked together we'd get along
         | fine. I'm struggling a bit with the LLM delulus as of late so
         | it's a breath of fresh air to read people out there who get it.
        
           | PaulHoule wrote:
           | As I see it three letter organizations have been using
           | frameworks like Apache UIMA to build information extraction
           | pipelines that are manual at worst and hybrid at best. Before
           | BERT the models we had for this sucked, only useful for
           | certain things, and usually requiring training sets of 20,000
           | or so examples.
           | 
           | Today the range of things for which the models are tolerable
           | to "great" has greatly expanded. In arXiv papers you tend to
           | see people getting tepid results with 500 examples, I get
           | better results with 5000 examples and diminishing returns
           | past 15k.
           | 
           | For a lot of people it begins and ends with "prompt
           | engineering" of commercial decoder models and evaluation
           | isn't even an afterthought For information extraction,
           | classification and such though you get often good results
           | with encoder models (e.g. BERT) put together with serious
           | eval, calibration and model selection. Still the system looks
           | like the old systems if your problem is hard and has to be
           | done in a scalable way, but sometimes you can make something
           | that "just works" without trying too hard, keeping your
           | train/eval data in a spreadsheet.
        
         | AlexCoventry wrote:
         | I think when the costs and latencies of reasoning models like
         | o1-pro, o3 and o4-mini-high come down, chatbots are going to be
         | very more effective for technical support. They're quite
         | reliable and knowledgeable, in my experience.
        
         | treis wrote:
         | LLM chatbots are a step forward for customer support. Well,
         | ours started hallucinating a support phone number that while is
         | a real number is not our number. Lots of people started calling
         | which was a bad time for everyone. Especially the person's
         | number it actually is. So maybe two steps forward and
         | occasionally one back.
        
         | stego-tech wrote:
         | > Integration into existing systems.
         | 
         | Integration alone isn't enough. Organizations let their data go
         | stale, because keeping it updated is a political task instead
         | of a technical one. Feeding an AI stale data effectively
         | renders it useless, because it doesn't have the presence of
         | mind to ask for assistance when it encounters an issue, or to
         | ask colleagues if this process is still correct even though the
         | expected data doesn't "fit".
         | 
         | Automations - including AI - _require_ clean, up-to-date data
         | in order to function effectively. Orgs who slap in a chatbot
         | and call it a day don 't understand the assignment.
        
       | jaxtracks wrote:
       | Interesting study! Far too early in the adoption lifecycle for
       | any conclusions I think, especially given that the data is from
       | Denmark which tends to be have a far less hype-driven business
       | culture than the US going by my bit of experience working in
       | both. Anecdotally, I've seen a couple of AI hiring freezes in the
       | states (some from LLM integrations I've built) that I'm fairly
       | sure will be reversed when management gets a more realistic sense
       | of capabilities, and my general sense is that the Danes I've
       | worked with would be far less likely to overestimate the value of
       | these tools.
        
         | sottol wrote:
         | I agree on the "far too early" part. But imo we can probably
         | say more about the impact in a year though, not 5-10 years. But
         | it does show that some of the randomized-controlled-trials that
         | showed large labor-force impact and productivity gains are
         | probably only applicable to a small sub-section of the work-
         | force.
         | 
         | It also looks like the second survey was sent out in June 2024
         | - so the data is 10 months old at this point, another reason
         | why this it might be early.
         | 
         | That said, the latest round of models are the first I've
         | started using more extensively.
         | 
         | The paper does address the fact that Denmark is not the US, but
         | supposedly not that different:
         | 
         | "First, Danish workers have been at the forefront of Generative
         | AI adoption, with take-up rates comparable to those in the
         | United States (Bick, Blandin and Deming, 2025; Humlum and
         | Vestergaard, 2025; RISJ, 2024).
         | 
         | Second, Denmark's labor market is highly flexible, with low
         | hiring and firing costs and decentralized wage bargaining--
         | similar to that of the U.S.--which allows firms and workers to
         | adjust hours and earnings in response to technological change
         | (Botero et al., 2004; Dahl, Le Maire and Munch, 2013). In
         | particular, most workers in our sample engage in annual
         | negotiations with their employers, providing regular
         | opportunities to adjust earnings and hours in response to AI
         | chatbot adoption during the study period."
        
       | meta_ai_x wrote:
       | It's incredibly hard to model complex non-linear systems. So,
       | while I applaud the researchers to provide some data points,
       | these things provide ZERO value for current/future decision
       | making.
       | 
       | Chatbots were absolute garbage before chatGPT, while post chatGPT
       | everything changed. So, there is going to be a tipping point
       | event on labor market effects and past single variable "data
       | analysis" will not provide anything to predict the event or it's
       | effects
        
       | Legend2440 wrote:
       | Seems premature, like measuring the economic impact of the
       | internet in 1985.
       | 
       | LLMs are more tech demo than product right now, and it could take
       | many years for their full impact to become apparent.
        
         | amarcheschi wrote:
         | I wouldn't call "premature" when llm companies ceos have been
         | proposing ai agents for replacing workers - and similar things
         | that I find debatable - in about the 2nd half of the twenties.
         | I mean, a cold shower might eventually happen for a lot of Ai
         | based companies
        
           | frankfrank13 wrote:
           | > cold shower might eventually happen for a lot of Ai based
           | companies
           | 
           | undoubtedly.
           | 
           | The economic impact of some _actually_ useful tools (Cursor,
           | Claude) are propping up hundreds of billions of dollars in
           | funding for, idk,  "AI for <pick an industry> "or "replace
           | your <job title> with our AI tool"
        
           | dehrmann wrote:
           | The most recent example is the Anthropic CEO:
           | 
           | > I think we will be there in three to six months, where AI
           | is writing 90% of the code. And then, in 12 months, we may be
           | in a world where AI is writing essentially all of the code
           | 
           | https://www.businessinsider.com/anthropic-ceo-
           | ai-90-percent-...
           | 
           | This seems either wildly optimistic or comes with a giant
           | asterisk that AI will write it by token predicting, then a
           | human will have to double check and refine it.
        
             | amarcheschi wrote:
             | I'm honestly slightly appalled by what we might miss by not
             | reading the docs and just letting Ai code. I'm attending a
             | course where we have to analyze medical datasets using up
             | to ~200gb of ram. Calculations can take some time. A simple
             | skim through the library (or even asking the chatbot) can
             | tell you that one of the longest call can be approximated
             | and it takes about 1/3rd of the time it takes with another
             | solver. And yet, none of my colleagues thought about either
             | looking the docs or asking the chatbot. Because it was
             | working. And of course the chatbot was using the solver
             | that was "standard" but that you probably don't need to use
             | for prototyping.
             | 
             | Again. We had some parts of one of 3 datasets split in ~40
             | files, and we had to manipulate and save them before doing
             | anything else. A colleague asked chatgpt to write the code
             | to do it and it was single threaded, and not feasible. I
             | hopped up on htop and upon seeing it was using only one
             | core, I suggested her to ask chatgpt to make the conversion
             | run on different files in different threads, and we
             | basically went from absolutely slow to quite fast. But that
             | supposed that the person using the code knows what's going
             | on, why, and what is not going on. And when it is possible
             | to do something different. Using it without asking yourself
             | more about the context is a terrible use imho, but it's
             | absolutely the direction that I see we're headed towards
             | and I'm not a fan of it
        
             | agumonkey wrote:
             | I anticipate a control issue, where agents can produce code
             | faster than people can analyze and beside applications with
             | small visible surfaces, nobody will be able to check what
             | is going on
             | 
             | I saw people with trouble manipulating boolean tables of 3
             | variables in their head trying to generate complete web
             | applications, it will work for linear duties (input ->
             | processing -> storage) but I highly doubt they will be able
             | to understand anything with 2nd order effects
        
         | layoric wrote:
         | A big difference here is the sheer scale of investment. In
         | 1985, the internet was running on the dreams of a few. The
         | sheer depth of investment in "AI" currently is hard to fathom,
         | and being injected into everything regardless of what customers
         | want.
        
       | trod1234 wrote:
       | We seriously live in the world of Anathem now where apparently
       | most people need a specialized expert to cut through plausible
       | generated misinformation as a whole.
       | 
       | This is a second similar study I've seen today on HN that seems
       | in part generated by AI, and fails rigorous methodology, while
       | making conclusions that are unbased to seemingly fuel a
       | narrative.
       | 
       | The study fails to account for a number of elements which nullify
       | the conclusions as a whole.
       | 
       | AI Chatbot tasks by their nature are communication tasks
       | involving a third-party (the customer). When the Chatbot fails to
       | direct, or loops coercively, and this is a task computer's really
       | can't do well; customers get enraged because it results in crazy-
       | making/inducing behavior. The Chatbot in such cases imposes time-
       | cost, with all the necessary elements suitable to call it
       | torture. Those elements being isolation, cognitive dissonance,
       | coercion with perceived/real loss, lack of agency. There is
       | little if any differentiation between the tasks measured.
       | Emotions Kill [1].
       | 
       | This results in outcomes where there is no change, or higher
       | demand for workers, just to calm that person down and this is
       | true regardless of occupation. In other words the punching bag of
       | verbal hostility, which is the role of CSR receiving calls or
       | communications from irrationally enraged customers after AI has
       | had their first chance to wind them up.
       | 
       | It is a stochastic environment, and very few conclusions can
       | actually be supported because they seem to follow reasoning along
       | a null hypothesis.
       | 
       | The surveys use Denmark as an example (being part of the EU), but
       | its unclear if they properly take into account company policies
       | about not submitting certain private data for tasks to a US-based
       | LLM given the risks related to GDPR. They say the surveys were
       | sent to workers directly who are already employed, but it makes
       | no measure of displaced workers, nor overall job reductions,
       | which historically is how the changes in integration are adopted,
       | misleading the non-domain expert reader.
       | 
       | The paper does not appear to be sound, and given it relies solely
       | on a DiD approach without specifying alternatives, it may be
       | pushing a pre-fabricated narrative that AI won't disrupt the
       | workforce when the study doesn't actually support that in any
       | meaningful rational way.
       | 
       | This isn't how you do good science. Overgeneralizing is a
       | fallacy, and while some computation is being done to limit that
       | it doesn't touch on what you don't know, because what you don't
       | know hasn't been quantified (i.e. the streetlight effect)[1].
       | 
       | To understand this, the layman and expert alike must always pay
       | attention to what you don't know. The video below touches on
       | _some_ of the issues without requiring technical expertise. [1]
       | 
       | [1][Talk] Survival Heuristics: My Favorite Techniques for
       | Avoiding Intelligence Traps - SANS CTI Summit 2018
       | 
       | https://www.youtube.com/watch?v=kNv2PlqmsAc
        
       ___________________________________________________________________
       (page generated 2025-04-25 23:00 UTC)