[HN Gopher] Has LLM killed traditional NLP?
       ___________________________________________________________________
        
       Has LLM killed traditional NLP?
        
       Author : vietthangif
       Score  : 106 points
       Date   : 2025-01-15 07:26 UTC (3 days ago)
        
 (HTM) web link (medium.com)
 (TXT) w3m dump (medium.com)
        
       | oliwary wrote:
       | This article seems to be paywalled unfortunately. While LLMs are
       | very useful when the tasks are complex and/or there is not a lot
       | of training data, I still think traditional NLP pipelines have a
       | very important role to play, including when:
       | 
       | - Depending on the complexity of the task and the required
       | results, SVMs or BERT can be enough in many cases and take much
       | lower resources, especially if there is a lot of training data
       | available. Training these models with LLM outputs could also be
       | an interesting approach to achieve this.
       | 
       | - When resources are constrained or latency is important.
       | 
       | - In some cases, there may be labeled data in certain classes
       | that have no semantic connection between them, e.g. explaining
       | the class to LLMs could be tricky.
        
         | eminent101 wrote:
         | > This article seems to be paywalled unfortunately.
         | 
         | I am no fan of Medium paywalled articles but if it helps you,
         | here's the article on archive - https://archive.is/J53CE
        
         | 99catmaster wrote:
         | https://archive.is/J53CE
        
       | RancheroBeans wrote:
       | NLP is an important part of upcoming RAG frameworks like
       | Microsoft's LazyGraphRAG. So I think it's more like NLP is a tool
       | used when the time is right.
       | 
       | https://www.microsoft.com/en-us/research/blog/lazygraphrag-s...
        
         | politelemon wrote:
         | I could use some help understanding, is this a set of tools or
         | techniques to answer questions? The name made me think it's
         | related to create embeddings but it seems much more?
        
       | michaelsbradley wrote:
       | https://archive.is/J53CE
        
       | axegon_ wrote:
       | No, it has not and will not in the foreseeable future. This is
       | one of my responsibilities at work. LLMs are not feasible when
       | you have a dataset of 10 million items that you need to classify
       | relatively fast and at a reasonable cost. LLMs are great at mid-
       | level complexity tasks given a reasonable volume of data - they
       | can take away the tedious job of figuring out what you are
       | looking at or even come up with some basic mapping. But anything
       | at large volumes.. Na. Real life example: "is '20 bottles of
       | ferric chloride' a service or a product?"
       | 
       | One prompt? Fair. 10? Still ok. 100? You're pushing it. 10M - get
       | help.
        
         | diggan wrote:
         | So TLDR: You agree with the author, but not for the same
         | reasons?
        
         | vlovich123 wrote:
         | That's the argument the article makes but the reasoning is a
         | little questionable on a few fronts:
         | 
         | - It uses f16 for the data format whereas quantization can
         | reduce the memory burden without a meaningful drop in accuracy,
         | especially as compared with traditional NLP techniques.
         | 
         | - The quality of LLMs typically outperform OpenCV + NER.
         | 
         | - You can choose to replace just part of the pipeline instead
         | of using the LLM for everything (e.g. using text-only 3B or 1B
         | models to replace the NER model while keeping OpenCV)
         | 
         | - The (LLM compute / quality) / watt is constantly decreasing.
         | Meaning even if it's too expensive today, the system you've
         | spent time building, tuning and maintaining today is quickly
         | becoming obsolete.
         | 
         | - Talking with new grads in NLP programs, all the focus is
         | basically on LLMs.
         | 
         | - The capability + quality out of models / size of model keeps
         | increasing. That means your existing RAM & performance budget
         | keeps absorbing problems that seemed previously out of reach
         | 
         | Now of course traditional techniques are valuable because they
         | can be an important tool in bringing down costs (fixed function
         | accelerator vs general purpose compute), but it's going to
         | become more niche and specialized with most tasks transitioning
         | to LLMs I think.
         | 
         | The "bitter lesson" paper is really relevant to these kinds of
         | discussions.
        
           | vlovich123 wrote:
           | Not an independent player so obviously important to be
           | critical of papers like this [1], but it's claiming a ~10x
           | cost in LLM inference every year. This lines up with the
           | technical papers I'm seeing that are continually improving
           | performance + the related HW improvements.
           | 
           | That's obviously not sustainable indefinitely, but these
           | kinds of exponentials are precisely why people often make
           | incorrect conclusions on how long change will take to happen.
           | Just a reminder: CPUs were 2x more performance every 18
           | months and continued to continually upend software companies
           | for 20 years who weren't in tune with this cycle (i.e.
           | focusing on performance instead of features). For example,
           | even if you're spending $10k/month for LLM vs $100/month to
           | process the 10M item, it can still be more beneficial to go
           | the LLM route as you can buy cheaper expertise to put
           | together your LLM pipeline than the NLP route to make up the
           | ~100k/year difference (assuming the performance otherwise
           | works and the improved quality and robustness of the LLM
           | solution isn't providing extra revenue to offset).
           | 
           | [1] https://a16z.com/llmflation-llm-inference-cost/
        
         | Kuinox wrote:
         | Prompt caching would lower the cost, later similar tech would
         | lower the inference cost too. You have less than 25 tokens,
         | thats between 1-5$.
         | 
         | There may be some use case but I'm not convinced with the one
         | you gave.
        
           | minimaxir wrote:
           | So there's a bit of an issue with prompt caching
           | implementations: for both OpenAI API and Claude's API, you
           | need a minimum of 1024 tokens to build the cache for whatever
           | reason. For simple problems, that can be hard to hit and may
           | require padding the system prompt a bit.
        
         | bloomingkales wrote:
         | I suspect any solution like that will be wholesale thrown away
         | in a year or two. Unless the damn thing is going to make money
         | in the next 2-3 years, we are all mostly going to write
         | throwaway code.
         | 
         | Things are such an opportunity cost now days. It's like trying
         | to capture value out of a transient amorphous cloud, you can't
         | hold any of it in your hand but the phenomenon is clearly
         | occurring.
        
         | MasterScrat wrote:
         | Can you talk about the main non-LLM NLP tools you use? e.g.
         | BERT models?
         | 
         | > One prompt? Fair. 10? Still ok. 100? You're pushing it. 10M -
         | get help.
         | 
         | Assuming you _could_ do 10M+ LLM calls for this task at trivial
         | cost and time, would you do it? i.e. is the only thing keeping
         | you away from LLM the fact they 're currently too cumbersome to
         | use?
        
         | scarface_74 wrote:
         | http://www.incompleteideas.net/IncIdeas/BitterLesson.html
        
         | devjab wrote:
         | While I agree with both you and the article I also think it'll
         | depend on more than just the volume of your data. We have quite
         | a lot of documents that we classify. It's around 10-100k a
         | month, some rather large others simple invoices. We used to
         | have a couple of AI specialists who handled the classification
         | with local NLP models, but when they left we had to find
         | alternatives. For us this was the AI services in the cloud we
         | use and the result has been a document warehouse which is both
         | easier for the business to manage and a "pipeline" which is
         | much cheaper than having those AI specialists on the payroll.
         | 
         | I imagine this wouldn't be the case if we were to do more
         | classification projects, but we aren't. We did try to find
         | replacements first, but it was impossible for us to attract any
         | talent, which isn't too much of a surprise considering it's
         | mainly maintenance. Using external consultants for that
         | maintenance proved to be almost more expensive than having two
         | full time employees.
        
         | blindriver wrote:
         | That's sort of like asking a horse and buggy driver whether
         | automobiles are going to put them out of business.
         | 
         | I think for the most part, casual nlp is dead because of LLMs.
         | And LLM costs are going to plummet soon, so large scale nlp
         | that you're talking about is probably dead within 5 years or
         | less. The fact that you can replace programmers with prompts is
         | huge in my opinion so no one needs to learn an nlm API anymore,
         | just stuff it into a prompt. Once costs to power LLMs decrease
         | to meet the cost of programmers it's game over.
        
           | dartos wrote:
           | > LLM costs
           | 
           | Inference costs, not training costs.
           | 
           | > The fact that you can replace programmers
           | 
           | You can't... not for any real project. For quick mockups
           | they're serviceable
           | 
           | > That's sort of like asking a horse and buggy driver whether
           | automobiles
           | 
           | Kind of an insult to OP, no? Horse and buggy drivers were not
           | highly educated experts in their field.
           | 
           | Maybe take the word of domain experts rather than AI company
           | marketing teams.
        
             | blindriver wrote:
             | > Maybe take the word of domain experts rather than AI
             | company marketing teams.
             | 
             | Appeal to authority is a well known logical fallacy.
             | 
             | I know how dead NLP is personally because I've never been
             | able to get NLP working but once ChatGPT came around, I was
             | able to classify texts extremely easily. It's
             | transformational.
             | 
             | I was able to get ChatGPT to classify posts based on how
             | political it was from a scale of 1 to 10 and which
             | political leaning they were and then classify the persons
             | likely political affiliations.
             | 
             | All of this without needing to learn any APIs or anything
             | about NLPs. Sorry but given my experience, NLPs are dead in
             | the water right now, except in terms of cost. And cost will
             | go down exponentially as they always do. Right now I'm
             | waiting for the RTC 5090 so I can just do it myself with
             | open source LLM.
        
               | FridgeSeal wrote:
               | "I couldn't be bothered learning something, and now I
               | don't have to! Checkmate!"
               | 
               | While LLM's can have their uses, let's not get carried
               | away.
        
               | scarface_74 wrote:
               | That's true. I did avoid learning traditional NLP
               | techniques because for my use case - call centers - LLMs
               | do a much better job.
               | 
               | Context for the problem space:
               | 
               | https://dl.acm.org/doi/fullHtml/10.1145/3442381.3449870
        
               | dartos wrote:
               | > Appeal to authority is a well known logical fallacy.
               | 
               | I did not make an appeal to authority. I made an appeal
               | to expertise.
               | 
               | It's why you'd trust a doctor's medical opinion over a
               | child's.
               | 
               | I'm not saying "listen to this guy because their captain
               | of NLP" I'm saying listen because experts have spent
               | years of hands on experience with things like getting NLP
               | working at all.
               | 
               | > I know how dead NLP is personally because I've never
               | been able to get NLP working
               | 
               | So you're not an expert in the field. Barely know
               | anything about it, but you're okay hand waving away
               | expertise bc you got a toy NLP Demo working...
               | 
               | That's great, dude.
               | 
               | > I was able to get ChatGPT to classify posts based on
               | how political it was from a scale of 1 to 10
               | 
               | And I know you didn't compare the results against classic
               | NLP to see if there was any improvements because you
               | don't know how...
        
               | blindriver wrote:
               | > I did not make an appeal to authority. I made an appeal
               | to expertise.
               | 
               | Lol
               | 
               | > I'm saying listen because experts have spent years of
               | hands on experience with things like getting NLP working
               | at all.
               | 
               | "It is difficult to get a man to understand something,
               | when his salary depends on his not understanding it."
               | 
               | Upton Sinclair
               | 
               | > Barely know anything about it, but you're okay hand
               | waving away expertise bc you got a toy NLP Demo
               | working...
               | 
               | Yes that's my point. I don't know anything about
               | implementing an NLP but got something that works pretty
               | well using an LLM extremely quickly and easily.
               | 
               | > And I know you didn't compare the results against
               | classic NLP to see if there was any improvements because
               | you don't know NLP...
               | 
               | Do you cross reference all your Google searches to make
               | sure they are giving you the best results vs Bing and
               | DDG?
               | 
               | Do you cross reference the results from your NLP with
               | LLMs to see if there were any improvements?
        
               | dartos wrote:
               | > Lol
               | 
               | Great argument
               | 
               | > "It is difficult to get a man to understand something,
               | when his salary depends on his not understanding it."
               | 
               | NLP professionals are also LLM professionals. LLMs are
               | tools in an NLP toolkit. LLMs don't make the NLP
               | professional obsolete the way it makes handwritten spam
               | obsolete.
               | 
               | I was going to explain this further but you literally
               | wouldn't understand.
               | 
               | > Do you cross reference all your Google searches to make
               | sure they are giving you the best results vs Bing and
               | DDG?
               | 
               | ...Yes I do...
               | 
               | That's why I cancelled my kagi subscription. It was just
               | as good as DDG.
               | 
               | > Do you cross reference the results from your NLP with
               | LLMs to see if there were any improvements?
               | 
               | Yes I do... because I want to use the best tool for the
               | job. Not just the first one I was able to get working...
        
               | elicksaur wrote:
               | I haven't understood these types of uses. How do you
               | validate the score that the LLM gives?
        
               | blindriver wrote:
               | The same way you validate scores given by NLPs I assume.
               | You run various tests and look at the results and see if
               | they match what you would expect.
        
               | thaw13579 wrote:
               | Performance and cost are trade-offs though. You could
               | just as well say that LLMs are dead in the water, except
               | in terms of performance.
               | 
               | It does seem likely we'll soon have cheap enough LLM
               | inference to displace traditional NLP entirely, although
               | not quite yet.
        
               | vunderba wrote:
               | > NLPs are dead in the water right now, except in terms
               | of cost.
               | 
               | False.
               | 
               | With all due respect, the fact that you're referring to
               | natural language parsing as "NLPs" makes me question
               | whether you have any experience or modest knowledge
               | around this topic, so it's rather bold of you to make
               | such sweeping generalizations.
               | 
               | It works for your use case because you're just _one
               | person_ running it on your home computer with consumer
               | hardware. Some of us have to run NLP related processing
               | (POS taggers, keyword extraction, etc) in a professional
               | environment at tremendous scale, and reaching for an LLM
               | would absolutely kill our performance.
        
               | gf000 wrote:
               | My understanding is that inference models can absolutely
               | scale down, we are only at the beginning of these getting
               | minimized, and they are trivial to parallelize. That's
               | not a good combo to be against them, their
               | price/performance/efficiency will quickly drop/grow/grow.
        
             | elwebmaster wrote:
             | Reply didn't say that the expert is uneducated, just that
             | their tool is obsolete. Better look at facts the way they
             | are, sugar coating doesn't serve anyone.
        
             | chaos_emergent wrote:
             | > Inference costs, not training costs.
             | 
             | Why does training cost matter if you have a general
             | intelligence that can do the task for you, that's getting
             | cheaper to run the task on?
             | 
             | > for quick mockups they're serviceable
             | 
             | I know multiple startups that use LLMs as their core bread-
             | and-butter intelligence platform instead of tuned but
             | traditional NLP models
             | 
             | > take the word of domain experts
             | 
             | I guess? I wouldn't call myself an expert by any means but
             | I've been working on NLP problems for about 5 years. Most
             | people I know in NLP-adjacent fields have converged around
             | LLMs being good for most (but obviously not all) problems.
             | 
             | > kind of an insult
             | 
             | Depends on whether you think OP intended to offend, ig
        
               | dartos wrote:
               | > Why does training cost matter if you have a general
               | intelligence that can do the task for you, that's getting
               | cheaper to run the task on?
               | 
               | Assuming we didn't need to train it ever again, it
               | wouldn't. But we don't have that, so...
               | 
               | > I know multiple startups that use LLMs as their core
               | bread-and-butter intelligence platform instead of tuned
               | but traditional NLP models
               | 
               | Okay? Did that system write itself entirely? Did it
               | replace the programmers that actually made it?
               | 
               | If so, they should pivot into a Devin competitor.
               | 
               | > Most people I know in NLP-adjacent fields have
               | converged around LLMs being good for most (but obviously
               | not all) problems.
               | 
               | Yeah LLMs are quite good at comming NLP tasks, but AFAIK
               | are not SOTA at any specific task.
               | 
               | Either way, LLMs obviously don't kill the need for the
               | NLP field.
        
           | otabdeveloper4 wrote:
           | > The fact that you can replace programmers with prompts
           | 
           | No, you can't. The only thing LLM's replace is internet
           | commentators.
        
             | blindriver wrote:
             | As I explained below, I avoided having to learn anything
             | about ML, PyTorch or any other APIs when trying to classify
             | posts based on how political they were and which
             | affiliation they were. That was holding me back and it was
             | easily replaced by an llm and a prompt. Literally took me
             | minutes what would have taken days or weeks and the results
             | are more than good enough.
        
               | datadrivenangel wrote:
               | GPT 3.5 is more accurate at classifying tweets as liberal
               | than it is at identifying posts that are conservative.
               | 
               | If you're going for rough approximation, LLMs are great,
               | and good enough. More care and conventional ML methods
               | are appropriate as the stakes increase though.
        
               | alexwebb2 wrote:
               | GPT 3.5 has been very, very obsolete in terms of price-
               | per-performance for over a year. Bit of a straw man.
        
               | otabdeveloper4 wrote:
               | > what would have taken days or weeks
               | 
               | Nah, searching Stackoverflow and Github doesn't take
               | "weeks".
               | 
               | That said, due to how utterly broken internet search is
               | nowadays, using an LLM as a search engine proxy is
               | viable.
        
             | portaouflop wrote:
             | No you can't; LLMs are dog shit at internet banter, too
             | neutered
        
           | arandomhuman wrote:
           | >The fact that you can replace programmers with prompts
           | 
           | this is how you end up with 1000s of lines of slop that you
           | have no idea how it functions.
        
         | alexwebb2 wrote:
         | I think your intuition on this might be lagging a fair bit
         | behind the current state of LLMs.
         | 
         | System message: answer with just "service" or "product"
         | 
         | User message (variable): 20 bottles of ferric chloride
         | 
         | Response: product
         | 
         | Model: OpenAI GPT-4o-mini
         | 
         | $0.075/1Mt batch input * 27 input tokens * 10M jobs = $20.25
         | 
         | $0.300/1Mt batch output * 1 output token * 10M jobs = $3.00
         | 
         | It's a sub-$25 job.
         | 
         | You'd need to be doing 20 times that volume every single day to
         | even start to justify hiring an NLP engineer instead.
        
           | LeafItAlone wrote:
           | >You'd need to be doing 20 times that volume every single day
           | to even start to justify hiring an NLP engineer instead.
           | 
           | How much for the "prompt engineer"? Who is going to be doing
           | the work and validating the output?
        
             | alexwebb2 wrote:
             | All software engineers are (or can be) prompt engineers, at
             | least to the level of trivial jobs like this. It's just an
             | API call and a one-liner instruction. Odds are very good at
             | most companies that they have someone on staff who can
             | knock this out in short order. No specialized hiring
             | required.
        
               | otabdeveloper4 wrote:
               | > ..and validating the output?
               | 
               | You glossed over the meat of the question.
        
               | alexwebb2 wrote:
               | Your validation approach doesn't really change based on
               | the classification method (LLM vs NLP).
               | 
               | At that volume you're going to use automated tests with
               | known correct answers + random sampling for human
               | validation.
        
             | IanCal wrote:
             | Prompt engineering is less and less of an issue the simpler
             | the job is and the more powerful the model is. You also
             | don't need someone with deep nlp knowledge to measure and
             | understand the output.
        
               | LeafItAlone wrote:
               | >less and less of an issue the simpler the job
               | 
               | Correct, everything is easy and simple if you make it
               | simple and easy...
        
               | IanCal wrote:
               | Plenty of simple jobs required people with deeper
               | knowledge of AI in the past, now for many tasks in
               | businesses you can skip over a lot of that and use a llm.
               | 
               | Simple things were not always easy. Many of them are,
               | now.
        
             | blindriver wrote:
             | You do not need a prompt engineer to create: "answer with
             | just "service" or "product""
             | 
             | Most classification prompts can be extremely easy and
             | intuitive. The idea you have to hire a completely different
             | prompt engineer is kind of funny. In fact you might be able
             | to get the llm itself to help revise the prompt.
        
           | elicksaur wrote:
           | How do you validate these classifications?
        
             | jeswin wrote:
             | Isn't it easier and cheaper to validate than to classify
             | (requires expensive engineers)? I mean the skill is not as
             | expensive - many companies do this at scale.
        
             | scarface_74 wrote:
             | You need a domain expert either way. I mentioned in another
             | reply that one of my niches is implementing call centers
             | with Amazon Connect and Amazon Lex (the NLP engine).
             | 
             | https://news.ycombinator.com/item?id=42748189
             | 
             | I don't know the domain beforehand they are working in, I
             | do validation testing with them.
        
             | segmondy wrote:
             | The same way you validate it if you didn't use an LLM.
        
             | bugglebeetle wrote:
             | The same way you check performance for any problem like
             | this: by creating one or more manually-labeled test
             | datasets, randomly sampled from the target data and looking
             | at the resulting precision, recall, f-scores etc. LLMs
             | change pretty much nothing about evaluation for most NLP
             | tasks.
        
           | axegon_ wrote:
           | Yeah... Let's talk time needed for 10M prompts and how that
           | fits into a daily pipeline. Enlighten us, please.
        
             | FloorEgg wrote:
             | Run them all in parallel with a cloud function in less than
             | a minute?
        
               | hnfong wrote:
               | Obviously all the LLM API providers have a rate limit.
               | Not a fan of GP's sarcastic tone, but I suppose many of
               | us would like to know roughly what that limit would be
               | for a small business using such APIs.
        
               | simonw wrote:
               | Surprisingly, DeepSeek doesn't have a rate limit:
               | https://api-docs.deepseek.com/quick_start/rate_limit
               | 
               | I've heard from people running 100+ prompts in parallel
               | against it.
        
               | jdietrich wrote:
               | The rate limits for Gemini 1.5 Flash are 2000 requests
               | per minute and 4 million tokens per minute. Higher limits
               | are available on request.
               | 
               | https://ai.google.dev/pricing#1_5flash
               | 
               | 4o-mini's rate limits scale based on your account
               | history, from 500RPM/200,000TPM to
               | 30,000RPM/150,000,000TPM.
               | 
               | https://platform.openai.com/docs/guides/rate-limits
        
               | rlt wrote:
               | Also can't you just combine multiple classification
               | requests into a single prompt?
        
               | axegon_ wrote:
               | Yes, how did I not think of throwing more money at cloud
               | providers on top of feeding open ai, when I could have
               | just code a simple binary classifier and run everything
               | on something as insignificant as an 8-th geh, quad core
               | i5....
        
           | simonw wrote:
           | You might be able to use an even cheaper model. Google Gemini
           | 1.5 Flash 8B is Input: $0.04 / Output: $0.15 per 1M tokens.
           | 
           | 17 input tokens and 2 output tokens * 10 million jobs =
           | 170,000,000 input tokens, 20,000,000 output tokens... which
           | costs a total of $6.38 https://tools.simonwillison.net/llm-
           | prices
           | 
           | As for rate limits, https://ai.google.dev/pricing#1_5flash-8B
           | says 4,000 requests per minute and 4 million tokens per
           | minute - so you could run those 10 million jobs in about 2500
           | minutes or 42 hours. I imagine you could pull a trick like
           | sending 10 items in a single prompt to help speed that up,
           | but you'd have to test carefully to check the accuracy
           | effects of doing that.
        
           | w10-1 wrote:
           | The question is not average cost but marginal cost of quality
           | - same as voice recognition, which had relatively low uptake
           | even at ~2-4% error rates due to context switching costs for
           | error correction.
           | 
           | So you'd have to account for the work of catching the residue
           | of 2-8%+ error from LLMs. I believe the premise is for NLP,
           | that's just incremental work, but for LLM's that could be
           | impossible to correct (i.e., cost per next-percentage-
           | correction explodes), for lack of easily controllable (or
           | even understandable) models.
           | 
           | But it's most rational in business to focus on the easy
           | majority with lower costs, and ignore hard parts that don't
           | lead to dramatically larger TAM.
        
             | gf000 wrote:
             | I am absolutely not an expert in NLP, but I wouldn't be
             | surprised if for many kinds of problems LLMs would have far
             | _less_ error rate, than any NLP software.
             | 
             | Like, lemmation is pretty damn dumb in NLP, while a better
             | LLM model will be orders of magnitude more correct.
        
         | WhitneyLand wrote:
         | For context, 10M would cost ~$27.
         | 
         | Say Gemini Flash 8B, allowing ~28 tokens for prompt input at
         | $0.075/1M tokens, plus 2 output tokens at $0.30/1M. Works out
         | to $0.0027 per classification. Or in other words, for 1 penny
         | you could do this classification 3.7 times.
        
         | segmondy wrote:
         | You are not pushing it at 100. I can classify "Is 20 bottles of
         | ferric chloride' a service or product in probably 2 seconds
         | with a 4090. Something that most people don't realize is you
         | can run multiple inference. So with something like a 4090, some
         | solid few shots, and instead of having it classify one example
         | at a time, you can do 5. We can probably run 100 parallel
         | inference at 5 at a time. For about a rate of 250 a second on a
         | 4090. So in 11 hours I'll be done. I'm going with a 7-8B model
         | too. Some of the 1.5-3B models are great and will even run
         | faster. Take a competent developer who knows python and how to
         | use an OpenAI compatible API, they can put this together in
         | 10-15 minutes, with no data science/scikit learn or other NLP
         | toolchain experience.
         | 
         | So for personal, medium or even large workloads, I think it has
         | killed it. It needs to be extremely large. If you are
         | classifying or segmenting comments on a social media platform
         | were you need to deal with billions a day, then LLM would be a
         | very inefficient approach, but for 90+% of use cases. I think
         | it wins.
         | 
         | I'm assuming you are going to run it locally because everyone
         | is paranoid about their data. It's even cheaper if you use a
         | cloud API.
        
           | mikeocool wrote:
           | If you have to classify user input as they're inputting it to
           | provide a response -- so it can't be batched - 2 seconds
           | could potentially be really slow.
           | 
           | Though LLMs sure have made creating training data to train
           | old school models for those cases a lot easier.
        
           | WildGreenLeave wrote:
           | Correct me if I'm wrong, but, if you run multiple inferences
           | at the same time on the same GPU you will need load multiple
           | models in the vram and the models will fight for resources
           | right? So running 10 parallel inferences will slow everything
           | down 5 times right? Or am I missing something?
        
             | aeternum wrote:
             | No, the key is to use the full context window so you
             | structure the prompt as something like: For each line
             | below, repeat the line, add a comma then output whether it
             | most closely represents a product or service:
             | 
             | 20 bottles of ferric chloride
             | 
             | salesforce
             | 
             | ...
        
               | e12e wrote:
               | Appreciate the concrete advice in this response. Thank
               | you.
        
             | Palmik wrote:
             | Inference for single example is memory bound. By doing
             | batch inference, you can interleave computation with memory
             | loads, without losing much speed (up until you cross the
             | compute bound threshold).
        
             | bavell wrote:
             | You will most likely be using the same model so just 1 to
             | load into vram.
        
           | axegon_ wrote:
           | FFS... "Lots of writers, few readers". Read again and do the
           | math: 2 seconds, multiply that by 10 million records which
           | contain this, as well as "alarm installation in two
           | locations" and a whole bunch of other crap with little to no
           | repetition (<2%) and where does that get you? 2 * 10,000,000
           | = 20,000,000 SECONDS!!!! A day has 86,400 seconds (24 * 3600
           | = 86,400). The data pipeline needs to finish in <24 hours.
           | Everyone needs to get this into their heads somehow: LLM's
           | are not a silver bullet. They will not cure cancer anytime
           | soon, nor will they be effective or cheap enough to run at
           | massive scale. And I don't mean cheap as in "oh, just get
           | openai subscription hurr durr". Throwing money mindlessly
           | into something is never an effective way to solve a problem.
        
             | gbnwl wrote:
             | Why are you using 2 seconds? The commenter you are
             | responding to hypothesized being able to do 250/s based on
             | "100 parallel inference at 5 at a time". Not speaking to
             | the validity of that, but find it strange that you ran with
             | the 2 seconds number after seemingly having stopped reading
             | after that line, while yourself lamenting people don't read
             | and telling them to "read again".
        
               | jazzyjackson wrote:
               | OP said 2 seconds as if that wasn't an eternity...
        
               | gbnwl wrote:
               | But then they said 250/second when running multiple
               | inference? Again I don't know if their assertions about
               | running multiple inference are correct but why focus on
               | the wrong number instead of addressing the actual claim?
        
               | axegon_ wrote:
               | Ok, let me dumb it down for you: you have a cockroach in
               | your bathroom and you want to kill it. You have an RPG
               | and you have a slipper. Are you gonna use the RPG or are
               | you going to use the slipper? Even if your bathroom is
               | fine after getting shot with an RPG somehow, isn't this
               | an overkill? If you can code and binary classifier train
               | a classifier in 2 hours that uses nearly 0 resources and
               | gives you good enough results(in my case way above what
               | my targets were) without having to use a ton of
               | resources, libraries, rags, hardware and hell, even
               | electricity? I mean how hard is this to comprehend
               | really?
               | 
               | https://deviq.com/antipatterns/shiny-toy
        
             | why_only_15 wrote:
             | Assuming the 10M records is ~2000M input tokens + 200M
             | output tokens, this would cost $300 to classify using
             | llama-3.3-70b[1]. If using llama lets you do this in say
             | one day instead of two days for a traditional NLP pipeline,
             | it's worthwhile.
             | 
             | [1]: https://openrouter.ai/meta-
             | llama/llama-3.3-70b-instruct
        
         | simonw wrote:
         | What NLP approaches are you using to solve the "is '20 bottles
         | of ferric chloride' a service or a product?" problem?
        
         | crystal_revenge wrote:
         | > LLMs are not feasible when you have a dataset of 10 million
         | items that you need to classify relatively fast and at a
         | reasonable cost.
         | 
         | What? That's simply not true.
         | 
         | Current embedding models are incredibly fast and cheap and
         | will, in the vast majority of NLP tasks, get you far better
         | results than any local set of features you can develop
         | yourself.
         | 
         | I've also done this at work numerous times, and have been
         | working on various NLP tasks for over a decade now. For all
         | future traditional NLP tasks the first pass is going to be to
         | get fetch LLM embeddings and stick on a fairly simple
         | classification model.
         | 
         | > One prompt? Fair. 10? Still ok. 100? You're pushing it. 10M -
         | get help.
         | 
         | "Prompting" is _not_ how you use LLMs for classification tasks.
         | Sure you can build 0-shot classifiers for some tricky tasks,
         | but if you 're doing classification for documents today and
         | you're _not_ starting with an embedding model you 're missing
         | some easy gains.
        
         | gf000 wrote:
         | Why not just run a local LLM for practically free? You can even
         | trivially parallelize it with multiple instances.
         | 
         | I would believe that many NLP problems can be easily solved
         | even by smaller LLM models.
        
         | sireat wrote:
         | So what would you use to classify whether a document is a
         | critique or something else in 1M documents in a non-English
         | language?
         | 
         | This is a real problem I am dealing with at a library project.
         | 
         | Each document is between 100 to 10k tokens.
         | 
         | Most top (read most expensive) LLMs available in OpenRouter
         | work great, it is the cost (and speed) that is the issue.
         | 
         | If I could come up with something locally runnable that would
         | be fantastic.
         | 
         | Presumably BERT based classifiers would work if I had one
         | properly trained for the language.
        
           | rahimnathwani wrote:
           | I guess you've already seen https://huggingface.co/collection
           | s/answerdotai/modernbert-67... ?
        
       | scarface_74 wrote:
       | For my use case, definitely.
       | 
       | I have worked on AWS Connect (online call center) and Amazon Lex
       | (the backing NLP engine) projects.
       | 
       | Before LLMs, it was a tedious process of trying to figure out all
       | of the different "utterances" that people could say and the
       | various languages you had to support. With LLMs, it's just
       | prompting
       | 
       | https://chatgpt.com/share/678bab08-f3a0-8010-82e0-32cff9c0b4...
       | 
       | I used something like this using Amazon Bedrock and a Lambda hook
       | for Amazon Lex. Of course it wasn't booking a flight. It was
       | another system
       | 
       | The above is a simplified version. In the real world , I gave it
       | a list of intents (book flights, reserve a room, rent a car) and
       | properties - "slots" - I needed for each intent.
        
         | gtirloni wrote:
         | How about the the costs?
        
           | scarface_74 wrote:
           | We measure savings in terms of call deflections. Clients we
           | work with say that each time a customer talks to an agent it
           | costs $2-$5. That's not even taking into account call
           | abandonments
        
             | IanCal wrote:
             | My base thing while advising people is that if anyone you
             | pay needs to read the output, or you are directly replacing
             | any kind of work then even frontier llm model inference
             | costs are irrelevant. Of course you need to work out of
             | that's truly the case but people worry about the cost in
             | places where it's just irrelevant. If it's $2 when you get
             | to an agent, each case that's avoided there could pay for
             | around a million words read/generated. That's expensive
             | compared to most API calls but irrelevant when counting
             | human costs.
        
         | elicksaur wrote:
         | Thank you for sharing an actual prompt thread. So much of the
         | LLM debate is washed in biases, and it is very helpful to share
         | concrete examples of outputs.
        
           | scarface_74 wrote:
           | The "cordele GA" example surprised me. I was expecting to get
           | a value of "null" for the airport code since I knew that city
           | had a population of 12K and no airport within its
           | metropolitan statistical area. It returned an airport that
           | was close.
           | 
           | Having world knowledge is a godsend. I also just tried a
           | prompt with "Alpharetta, GA" a city north of Atlanta and it
           | returned ATL. An NLP could never do that without a lot more
           | work.
        
         | LeafItAlone wrote:
         | That's a great example and I understand it was intentionally
         | simple but highlighted how LLMs need care with use. Not that
         | this example is very related to NLP:
         | 
         | My prompt: `<<I want a flight from portland to cuba after
         | easter>>`
         | 
         | The response: ``` { "origin": ["PDX"], "destination": ["HAV"],
         | "date": "2025-04-01", "departure_time": null, "preferences":
         | null } ```
         | 
         | Of course I meant Portland Maine (PWM), there is more than one
         | airport option in Cuba than HAV, and it got the date wrong,
         | since Easter is April 20 this year.
        
           | scarface_74 wrote:
           | If the business stakeholders came out with that scenario, I
           | would modify the prompt like this. You would know the users
           | address if they had an account.
           | 
           | https://chatgpt.com/share/678c1708-639c-8010-a6be-9ce1055703.
           | ..
        
             | LeafItAlone wrote:
             | OK, but that only fixed one of the three issues.
        
               | scarface_74 wrote:
               | While the first one is easy. I mean you _could_ give it a
               | list of holidays and dates. But the rest you would just
               | ask the user to confirm the information and say "is this
               | correct"? If they say "No" ask them which isn't correct
               | and let them correct it.
               | 
               | I would definitely assume someone wanted to leave from an
               | airport close by if they didn't say anything.
               | 
               | You don't want the prompt to grow too much. But you do
               | have analytics that you can use to improve your prompt.
               | 
               | In the case of Connect, you define your logic using a GUi
               | flowchart builder called a contact flow.
               | 
               | BTW: with my new prompt, it did assume the correct
               | airport "<<I want to go to Cuba after Easter>>"
        
               | LeafItAlone wrote:
               | Sure, all the problems are "easy" once you identify them.
               | As with most products. But the majority of Show HN posts
               | here relying on LLMs that I see don't account for simple
               | things like my example. Flights finders in particular
               | have been pretty bad.
               | 
               | >BTW: with my new prompt, it did assume the correct
               | airport "<<I want to go to Cuba after Easter>>"
               | 
               | Not really. It chose the airport you put basically in the
               | prompt. But I don't live in MA, I live closer to PDX. And
               | it didn't suggest the multiple other Cuba airports. So
               | you'll end up with a lot of guiding rules.
        
               | scarface_74 wrote:
               | A human would assume if you said "Portland" they would
               | first assume you meant PDX unless they looked up your
               | address and then they would assume Maine.
               | 
               | Just like if I said I wanted to fly to Albany, they would
               | think I meant New York and not my parents city in south
               | GA (ABY) which only has three commercial flights a day.
               | 
               | Even with a human agent, you ask for confirmation.
               | 
               | Also, I ask to speak to people on the ground - in this
               | case it would be CSRs - to break it.
               | 
               | That's another reason I think "side projects" are useless
               | and they don't have any merit on resumes. I want them to
               | talk about real world implementations.
        
         | vpribish wrote:
         | link is a 404, sadly. what did it say before?
        
           | scarface_74 wrote:
           | The link works for me even in cognito mode.
           | 
           | The prompt:
           | 
           |  _you are a chatbot that helps users book flights. Please
           | extract the origin city, destination city, travel date, and
           | any additional preferences (e.g., time of day, class of
           | service). If any of the details are missing, make the value
           | "null". If the date is relative (e.g., "tomorrow", "next
           | week"), convert it to a specific date.
           | 
           | User Input: "<User's Query>"
           | 
           | Output (JSON format): { "origin": list of airport codes
           | "destination": list of airport codes, "date": "<Extracted
           | Date>", "departure_time": "<Extracted Departure Time (if
           | applicable)>", "preferences": "<Any additional preferences
           | like class of service (optional)>" }
           | 
           | The users request will be surrounded by <<>>
           | 
           | Always return JSON with any missing properties having a value
           | of null. Always return English. Return a list of airport
           | codes for the city. For instance New York has two airports
           | give both
           | 
           | Always return responses in English_
        
       | DebtDeflation wrote:
       | The question seems malformed to me.
       | 
       | Text classification, clustering, named entity recognition, etc.
       | are NLP tasks. LLMs can perform these tasks. ML models that are
       | not LLMs (or even not deep learning models) can also perform
       | these tasks. Is the author perhaps asking if the concept of a
       | "completion" has replaced all of these tasks?
       | 
       | When I hear "traditional NLP" I think not of the above types of
       | tasks but rather the methodology employed for performing them.
       | For example, building a pipeline to do stemming/lemmatization,
       | part of speech tagging, coreference resolution, etc. before the
       | text gets fed to a classifier model. This was SOTA 10 years ago
       | but I don't think many people are still doing it today.
        
       | retinaros wrote:
       | cant read the article. do they consider BERT as an LLM? there are
       | tasks still in NLP where BERT is better than a GPT like
        
         | selimthegrim wrote:
         | Like?
        
           | Olfurm wrote:
           | Like named entity recognition or relations recognition. Check
           | https://github.com/urchade/GLiNER
        
       | darepublic wrote:
       | I remember using the open NLP library from Stanford around 2016.
       | It would do parts of speech tagging of words in a sentence
       | (labelling the words with their grammatical function). It was
       | pretty good but reliably failed on certain words where context
       | determined the tag. When for gpt 3 came out the first thing I
       | tested it out on was parts of speech tagging. In particular those
       | sentences open NLP had trouble with. And it aced everything I was
       | impressed.
        
       | leobg wrote:
       | There are AI bros that will call an LLM to do what you could do
       | with a regex. I've seen people do the chunking for RAG using an
       | LLM...
        
         | tossandthrow wrote:
         | If you think about chunking as "take x characters" then using
         | LLMs is a poor idea.
         | 
         | But syntactic chunking also works really poorly for any serious
         | application as you loose basically all context.
         | 
         |  _Semantic_ chunking, however, is a task you absolutely would
         | use LLMs for.
        
       | freefaler wrote:
       | If archive links aren't working this works:
       | 
       | https://freedium.cfd/https://medium.com/altitudehq/is-tradit...
        
       | vedant wrote:
       | The title of this article feels like "has electricity killed oil
       | lamps"?
        
       | itissid wrote:
       | LLM Design/Use has only about as much to with engineering as
       | building a plane has to do with actually flying it.
       | 
       | Every business is kind of a unicorn in its problems NLP is a
       | small part of it. Like even if it did perform cheaply enough to
       | do NLP, how would you replace parts like: 1. Evaluation system
       | that uses Calibration(Human labels) 2. Ground Truth
       | Collection(Human + sometimes semi automated) 3. QA testing by end
       | users.
       | 
       | Even if LLMs made it easier to do NLP there are correlations with
       | the above which means your NLP process is hugely influenced so
       | much that you still need an engineer. If you have an engineer who
       | only for doing NLP and nothing else you are quite hyper
       | specialized like to the extent you are only building planes
       | 0.01%: of the engineering work out there.
        
       | thangalin wrote:
       | I created an NLP library to help curl straight quotes into curly
       | quotes. Last I checked, LLMs struggled to curl the following
       | straight quotation marks:                   ''E's got a 'ittle
       | box 'n a big 'un,' she said, 'wit' th' 'ittle 'un 'bout 2'x6".
       | An' no, y'ain't cryin' on th' "soap box" to me no mo, y'hear.
       | 'Cause it 'tweren't ever a spec o' fun!' I says to my frien'.
       | 
       | The library is integrated into my Markdown editor, KeenWrite
       | (https://keenwrite.com/), to correctly curl quotation marks into
       | entities before passing them over to ConTeXt for typesetting.
       | While there are other ways to indicate opening and closing
       | quotation marks, none are as natural to type in plain text as
       | straight quotes. I would not trust an LLM curl quotation marks
       | accurately.
       | 
       | For the curious, you can try it at:
       | 
       | https://whitemagicsoftware.com/keenquotes/
       | 
       | If you find any edge cases that don't work, do let me know. The
       | library correctly curls my entire novel. There are a few edge
       | cases that are completely ambiguous, however, that require
       | semantic knowledge (part-of-speech tagging), which I haven't
       | added. PoS tagging would be a heavy operation that could prevent
       | real-time quote curling for little practical gain.
       | 
       | The lexer, parser, and test cases are all open source.
       | 
       | https://gitlab.com/DaveJarvis/KeenQuotes/-/tree/main/src/mai...
        
         | jcheng wrote:
         | Great example. I just tried it with a few LLMs and got horrible
         | results. GPT-4o got a ton of them wrong, GPT-1o got them all
         | correct AFAICT but took 1m50s to do so, and Claude 3.5 Sonnet
         | said "Here's the text with straight quotes converted to curly
         | quotes" but then returned the text with all the straight quotes
         | intact.
         | 
         | I'm very surprised all three models didn't nail it immediately.
        
         | gf000 wrote:
         | I would be interested how well would even a smaller LLM model
         | work after fine tuning. Besides the overhead of an LLM, I would
         | assume they would do a much better job at it in the edge cases
         | (where contextual understanding is required).
        
       | derbaum wrote:
       | One of the things I'm still struggling with when using LLMs over
       | NLP is classification against a large corpus of data. If I get a
       | new text and I want to find the most similar text out of a
       | million others, semantically speaking, how would I do this with
       | an LLM? Apart from choosing certain pre-defined categories (such
       | as "friendly", "political", ...) and then letting the LLM rate
       | each text on each category, I can't see a simple solution yet
       | except using embeddings (which I think could just be done using
       | BERT and does not count as LLM usage?).
        
         | thaumasiotes wrote:
         | Take two documents.
         | 
         | Feed one through an LLM, one word at a time, and keep track of
         | words that experience greatly inflated probabilities of
         | occurrence, compared to baseline English. "For" is probably
         | going to maintain a level of likelihood close to baseline.
         | "Engine" is not.
         | 
         | Do the same thing for the other one.
         | 
         | See how much overlap you get.
        
           | derbaum wrote:
           | Wouldn't a simple comparison of the word frequency in my text
           | against a list of usual word frequencies do the trick here
           | without an LLM? Sort of a BM25?
        
         | macNchz wrote:
         | I've used embeddings to define clusters, then passed sampled
         | documents from each cluster to an LLM to create labels for each
         | grouping. I had pretty impressive results from this approach
         | when creating a category/subcategory labels for a collection of
         | texts I worked on recently.
        
           | derbaum wrote:
           | That's interesting, it sounds a bit like those cluster graph
           | visualisation techniques. Unfortunately, my texts seem to
           | fall into clusters that really don't match the ones that I
           | had hoped to get out of these methods. I guess it's just a
           | matter of fine-tuning now.
        
       | vletal wrote:
       | The idea that we can solve "language" by breaking down and
       | understanding sentences is naive and funny with the benefit of
       | hindsight, is it not?
       | 
       | An equivalently funny attitude seems to be the "natural language
       | will replace programming languages". Let's see how that one will
       | work out when the hype is over.
        
       ___________________________________________________________________
       (page generated 2025-01-18 23:01 UTC)