[HN Gopher] Has LLM killed traditional NLP?
___________________________________________________________________
Has LLM killed traditional NLP?
Author : vietthangif
Score : 106 points
Date : 2025-01-15 07:26 UTC (3 days ago)
(HTM) web link (medium.com)
(TXT) w3m dump (medium.com)
| oliwary wrote:
| This article seems to be paywalled unfortunately. While LLMs are
| very useful when the tasks are complex and/or there is not a lot
| of training data, I still think traditional NLP pipelines have a
| very important role to play, including when:
|
| - Depending on the complexity of the task and the required
| results, SVMs or BERT can be enough in many cases and take much
| lower resources, especially if there is a lot of training data
| available. Training these models with LLM outputs could also be
| an interesting approach to achieve this.
|
| - When resources are constrained or latency is important.
|
| - In some cases, there may be labeled data in certain classes
| that have no semantic connection between them, e.g. explaining
| the class to LLMs could be tricky.
| eminent101 wrote:
| > This article seems to be paywalled unfortunately.
|
| I am no fan of Medium paywalled articles but if it helps you,
| here's the article on archive - https://archive.is/J53CE
| 99catmaster wrote:
| https://archive.is/J53CE
| RancheroBeans wrote:
| NLP is an important part of upcoming RAG frameworks like
| Microsoft's LazyGraphRAG. So I think it's more like NLP is a tool
| used when the time is right.
|
| https://www.microsoft.com/en-us/research/blog/lazygraphrag-s...
| politelemon wrote:
| I could use some help understanding, is this a set of tools or
| techniques to answer questions? The name made me think it's
| related to create embeddings but it seems much more?
| michaelsbradley wrote:
| https://archive.is/J53CE
| axegon_ wrote:
| No, it has not and will not in the foreseeable future. This is
| one of my responsibilities at work. LLMs are not feasible when
| you have a dataset of 10 million items that you need to classify
| relatively fast and at a reasonable cost. LLMs are great at mid-
| level complexity tasks given a reasonable volume of data - they
| can take away the tedious job of figuring out what you are
| looking at or even come up with some basic mapping. But anything
| at large volumes.. Na. Real life example: "is '20 bottles of
| ferric chloride' a service or a product?"
|
| One prompt? Fair. 10? Still ok. 100? You're pushing it. 10M - get
| help.
| diggan wrote:
| So TLDR: You agree with the author, but not for the same
| reasons?
| vlovich123 wrote:
| That's the argument the article makes but the reasoning is a
| little questionable on a few fronts:
|
| - It uses f16 for the data format whereas quantization can
| reduce the memory burden without a meaningful drop in accuracy,
| especially as compared with traditional NLP techniques.
|
| - The quality of LLMs typically outperform OpenCV + NER.
|
| - You can choose to replace just part of the pipeline instead
| of using the LLM for everything (e.g. using text-only 3B or 1B
| models to replace the NER model while keeping OpenCV)
|
| - The (LLM compute / quality) / watt is constantly decreasing.
| Meaning even if it's too expensive today, the system you've
| spent time building, tuning and maintaining today is quickly
| becoming obsolete.
|
| - Talking with new grads in NLP programs, all the focus is
| basically on LLMs.
|
| - The capability + quality out of models / size of model keeps
| increasing. That means your existing RAM & performance budget
| keeps absorbing problems that seemed previously out of reach
|
| Now of course traditional techniques are valuable because they
| can be an important tool in bringing down costs (fixed function
| accelerator vs general purpose compute), but it's going to
| become more niche and specialized with most tasks transitioning
| to LLMs I think.
|
| The "bitter lesson" paper is really relevant to these kinds of
| discussions.
| vlovich123 wrote:
| Not an independent player so obviously important to be
| critical of papers like this [1], but it's claiming a ~10x
| cost in LLM inference every year. This lines up with the
| technical papers I'm seeing that are continually improving
| performance + the related HW improvements.
|
| That's obviously not sustainable indefinitely, but these
| kinds of exponentials are precisely why people often make
| incorrect conclusions on how long change will take to happen.
| Just a reminder: CPUs were 2x more performance every 18
| months and continued to continually upend software companies
| for 20 years who weren't in tune with this cycle (i.e.
| focusing on performance instead of features). For example,
| even if you're spending $10k/month for LLM vs $100/month to
| process the 10M item, it can still be more beneficial to go
| the LLM route as you can buy cheaper expertise to put
| together your LLM pipeline than the NLP route to make up the
| ~100k/year difference (assuming the performance otherwise
| works and the improved quality and robustness of the LLM
| solution isn't providing extra revenue to offset).
|
| [1] https://a16z.com/llmflation-llm-inference-cost/
| Kuinox wrote:
| Prompt caching would lower the cost, later similar tech would
| lower the inference cost too. You have less than 25 tokens,
| thats between 1-5$.
|
| There may be some use case but I'm not convinced with the one
| you gave.
| minimaxir wrote:
| So there's a bit of an issue with prompt caching
| implementations: for both OpenAI API and Claude's API, you
| need a minimum of 1024 tokens to build the cache for whatever
| reason. For simple problems, that can be hard to hit and may
| require padding the system prompt a bit.
| bloomingkales wrote:
| I suspect any solution like that will be wholesale thrown away
| in a year or two. Unless the damn thing is going to make money
| in the next 2-3 years, we are all mostly going to write
| throwaway code.
|
| Things are such an opportunity cost now days. It's like trying
| to capture value out of a transient amorphous cloud, you can't
| hold any of it in your hand but the phenomenon is clearly
| occurring.
| MasterScrat wrote:
| Can you talk about the main non-LLM NLP tools you use? e.g.
| BERT models?
|
| > One prompt? Fair. 10? Still ok. 100? You're pushing it. 10M -
| get help.
|
| Assuming you _could_ do 10M+ LLM calls for this task at trivial
| cost and time, would you do it? i.e. is the only thing keeping
| you away from LLM the fact they 're currently too cumbersome to
| use?
| scarface_74 wrote:
| http://www.incompleteideas.net/IncIdeas/BitterLesson.html
| devjab wrote:
| While I agree with both you and the article I also think it'll
| depend on more than just the volume of your data. We have quite
| a lot of documents that we classify. It's around 10-100k a
| month, some rather large others simple invoices. We used to
| have a couple of AI specialists who handled the classification
| with local NLP models, but when they left we had to find
| alternatives. For us this was the AI services in the cloud we
| use and the result has been a document warehouse which is both
| easier for the business to manage and a "pipeline" which is
| much cheaper than having those AI specialists on the payroll.
|
| I imagine this wouldn't be the case if we were to do more
| classification projects, but we aren't. We did try to find
| replacements first, but it was impossible for us to attract any
| talent, which isn't too much of a surprise considering it's
| mainly maintenance. Using external consultants for that
| maintenance proved to be almost more expensive than having two
| full time employees.
| blindriver wrote:
| That's sort of like asking a horse and buggy driver whether
| automobiles are going to put them out of business.
|
| I think for the most part, casual nlp is dead because of LLMs.
| And LLM costs are going to plummet soon, so large scale nlp
| that you're talking about is probably dead within 5 years or
| less. The fact that you can replace programmers with prompts is
| huge in my opinion so no one needs to learn an nlm API anymore,
| just stuff it into a prompt. Once costs to power LLMs decrease
| to meet the cost of programmers it's game over.
| dartos wrote:
| > LLM costs
|
| Inference costs, not training costs.
|
| > The fact that you can replace programmers
|
| You can't... not for any real project. For quick mockups
| they're serviceable
|
| > That's sort of like asking a horse and buggy driver whether
| automobiles
|
| Kind of an insult to OP, no? Horse and buggy drivers were not
| highly educated experts in their field.
|
| Maybe take the word of domain experts rather than AI company
| marketing teams.
| blindriver wrote:
| > Maybe take the word of domain experts rather than AI
| company marketing teams.
|
| Appeal to authority is a well known logical fallacy.
|
| I know how dead NLP is personally because I've never been
| able to get NLP working but once ChatGPT came around, I was
| able to classify texts extremely easily. It's
| transformational.
|
| I was able to get ChatGPT to classify posts based on how
| political it was from a scale of 1 to 10 and which
| political leaning they were and then classify the persons
| likely political affiliations.
|
| All of this without needing to learn any APIs or anything
| about NLPs. Sorry but given my experience, NLPs are dead in
| the water right now, except in terms of cost. And cost will
| go down exponentially as they always do. Right now I'm
| waiting for the RTC 5090 so I can just do it myself with
| open source LLM.
| FridgeSeal wrote:
| "I couldn't be bothered learning something, and now I
| don't have to! Checkmate!"
|
| While LLM's can have their uses, let's not get carried
| away.
| scarface_74 wrote:
| That's true. I did avoid learning traditional NLP
| techniques because for my use case - call centers - LLMs
| do a much better job.
|
| Context for the problem space:
|
| https://dl.acm.org/doi/fullHtml/10.1145/3442381.3449870
| dartos wrote:
| > Appeal to authority is a well known logical fallacy.
|
| I did not make an appeal to authority. I made an appeal
| to expertise.
|
| It's why you'd trust a doctor's medical opinion over a
| child's.
|
| I'm not saying "listen to this guy because their captain
| of NLP" I'm saying listen because experts have spent
| years of hands on experience with things like getting NLP
| working at all.
|
| > I know how dead NLP is personally because I've never
| been able to get NLP working
|
| So you're not an expert in the field. Barely know
| anything about it, but you're okay hand waving away
| expertise bc you got a toy NLP Demo working...
|
| That's great, dude.
|
| > I was able to get ChatGPT to classify posts based on
| how political it was from a scale of 1 to 10
|
| And I know you didn't compare the results against classic
| NLP to see if there was any improvements because you
| don't know how...
| blindriver wrote:
| > I did not make an appeal to authority. I made an appeal
| to expertise.
|
| Lol
|
| > I'm saying listen because experts have spent years of
| hands on experience with things like getting NLP working
| at all.
|
| "It is difficult to get a man to understand something,
| when his salary depends on his not understanding it."
|
| Upton Sinclair
|
| > Barely know anything about it, but you're okay hand
| waving away expertise bc you got a toy NLP Demo
| working...
|
| Yes that's my point. I don't know anything about
| implementing an NLP but got something that works pretty
| well using an LLM extremely quickly and easily.
|
| > And I know you didn't compare the results against
| classic NLP to see if there was any improvements because
| you don't know NLP...
|
| Do you cross reference all your Google searches to make
| sure they are giving you the best results vs Bing and
| DDG?
|
| Do you cross reference the results from your NLP with
| LLMs to see if there were any improvements?
| dartos wrote:
| > Lol
|
| Great argument
|
| > "It is difficult to get a man to understand something,
| when his salary depends on his not understanding it."
|
| NLP professionals are also LLM professionals. LLMs are
| tools in an NLP toolkit. LLMs don't make the NLP
| professional obsolete the way it makes handwritten spam
| obsolete.
|
| I was going to explain this further but you literally
| wouldn't understand.
|
| > Do you cross reference all your Google searches to make
| sure they are giving you the best results vs Bing and
| DDG?
|
| ...Yes I do...
|
| That's why I cancelled my kagi subscription. It was just
| as good as DDG.
|
| > Do you cross reference the results from your NLP with
| LLMs to see if there were any improvements?
|
| Yes I do... because I want to use the best tool for the
| job. Not just the first one I was able to get working...
| elicksaur wrote:
| I haven't understood these types of uses. How do you
| validate the score that the LLM gives?
| blindriver wrote:
| The same way you validate scores given by NLPs I assume.
| You run various tests and look at the results and see if
| they match what you would expect.
| thaw13579 wrote:
| Performance and cost are trade-offs though. You could
| just as well say that LLMs are dead in the water, except
| in terms of performance.
|
| It does seem likely we'll soon have cheap enough LLM
| inference to displace traditional NLP entirely, although
| not quite yet.
| vunderba wrote:
| > NLPs are dead in the water right now, except in terms
| of cost.
|
| False.
|
| With all due respect, the fact that you're referring to
| natural language parsing as "NLPs" makes me question
| whether you have any experience or modest knowledge
| around this topic, so it's rather bold of you to make
| such sweeping generalizations.
|
| It works for your use case because you're just _one
| person_ running it on your home computer with consumer
| hardware. Some of us have to run NLP related processing
| (POS taggers, keyword extraction, etc) in a professional
| environment at tremendous scale, and reaching for an LLM
| would absolutely kill our performance.
| gf000 wrote:
| My understanding is that inference models can absolutely
| scale down, we are only at the beginning of these getting
| minimized, and they are trivial to parallelize. That's
| not a good combo to be against them, their
| price/performance/efficiency will quickly drop/grow/grow.
| elwebmaster wrote:
| Reply didn't say that the expert is uneducated, just that
| their tool is obsolete. Better look at facts the way they
| are, sugar coating doesn't serve anyone.
| chaos_emergent wrote:
| > Inference costs, not training costs.
|
| Why does training cost matter if you have a general
| intelligence that can do the task for you, that's getting
| cheaper to run the task on?
|
| > for quick mockups they're serviceable
|
| I know multiple startups that use LLMs as their core bread-
| and-butter intelligence platform instead of tuned but
| traditional NLP models
|
| > take the word of domain experts
|
| I guess? I wouldn't call myself an expert by any means but
| I've been working on NLP problems for about 5 years. Most
| people I know in NLP-adjacent fields have converged around
| LLMs being good for most (but obviously not all) problems.
|
| > kind of an insult
|
| Depends on whether you think OP intended to offend, ig
| dartos wrote:
| > Why does training cost matter if you have a general
| intelligence that can do the task for you, that's getting
| cheaper to run the task on?
|
| Assuming we didn't need to train it ever again, it
| wouldn't. But we don't have that, so...
|
| > I know multiple startups that use LLMs as their core
| bread-and-butter intelligence platform instead of tuned
| but traditional NLP models
|
| Okay? Did that system write itself entirely? Did it
| replace the programmers that actually made it?
|
| If so, they should pivot into a Devin competitor.
|
| > Most people I know in NLP-adjacent fields have
| converged around LLMs being good for most (but obviously
| not all) problems.
|
| Yeah LLMs are quite good at comming NLP tasks, but AFAIK
| are not SOTA at any specific task.
|
| Either way, LLMs obviously don't kill the need for the
| NLP field.
| otabdeveloper4 wrote:
| > The fact that you can replace programmers with prompts
|
| No, you can't. The only thing LLM's replace is internet
| commentators.
| blindriver wrote:
| As I explained below, I avoided having to learn anything
| about ML, PyTorch or any other APIs when trying to classify
| posts based on how political they were and which
| affiliation they were. That was holding me back and it was
| easily replaced by an llm and a prompt. Literally took me
| minutes what would have taken days or weeks and the results
| are more than good enough.
| datadrivenangel wrote:
| GPT 3.5 is more accurate at classifying tweets as liberal
| than it is at identifying posts that are conservative.
|
| If you're going for rough approximation, LLMs are great,
| and good enough. More care and conventional ML methods
| are appropriate as the stakes increase though.
| alexwebb2 wrote:
| GPT 3.5 has been very, very obsolete in terms of price-
| per-performance for over a year. Bit of a straw man.
| otabdeveloper4 wrote:
| > what would have taken days or weeks
|
| Nah, searching Stackoverflow and Github doesn't take
| "weeks".
|
| That said, due to how utterly broken internet search is
| nowadays, using an LLM as a search engine proxy is
| viable.
| portaouflop wrote:
| No you can't; LLMs are dog shit at internet banter, too
| neutered
| arandomhuman wrote:
| >The fact that you can replace programmers with prompts
|
| this is how you end up with 1000s of lines of slop that you
| have no idea how it functions.
| alexwebb2 wrote:
| I think your intuition on this might be lagging a fair bit
| behind the current state of LLMs.
|
| System message: answer with just "service" or "product"
|
| User message (variable): 20 bottles of ferric chloride
|
| Response: product
|
| Model: OpenAI GPT-4o-mini
|
| $0.075/1Mt batch input * 27 input tokens * 10M jobs = $20.25
|
| $0.300/1Mt batch output * 1 output token * 10M jobs = $3.00
|
| It's a sub-$25 job.
|
| You'd need to be doing 20 times that volume every single day to
| even start to justify hiring an NLP engineer instead.
| LeafItAlone wrote:
| >You'd need to be doing 20 times that volume every single day
| to even start to justify hiring an NLP engineer instead.
|
| How much for the "prompt engineer"? Who is going to be doing
| the work and validating the output?
| alexwebb2 wrote:
| All software engineers are (or can be) prompt engineers, at
| least to the level of trivial jobs like this. It's just an
| API call and a one-liner instruction. Odds are very good at
| most companies that they have someone on staff who can
| knock this out in short order. No specialized hiring
| required.
| otabdeveloper4 wrote:
| > ..and validating the output?
|
| You glossed over the meat of the question.
| alexwebb2 wrote:
| Your validation approach doesn't really change based on
| the classification method (LLM vs NLP).
|
| At that volume you're going to use automated tests with
| known correct answers + random sampling for human
| validation.
| IanCal wrote:
| Prompt engineering is less and less of an issue the simpler
| the job is and the more powerful the model is. You also
| don't need someone with deep nlp knowledge to measure and
| understand the output.
| LeafItAlone wrote:
| >less and less of an issue the simpler the job
|
| Correct, everything is easy and simple if you make it
| simple and easy...
| IanCal wrote:
| Plenty of simple jobs required people with deeper
| knowledge of AI in the past, now for many tasks in
| businesses you can skip over a lot of that and use a llm.
|
| Simple things were not always easy. Many of them are,
| now.
| blindriver wrote:
| You do not need a prompt engineer to create: "answer with
| just "service" or "product""
|
| Most classification prompts can be extremely easy and
| intuitive. The idea you have to hire a completely different
| prompt engineer is kind of funny. In fact you might be able
| to get the llm itself to help revise the prompt.
| elicksaur wrote:
| How do you validate these classifications?
| jeswin wrote:
| Isn't it easier and cheaper to validate than to classify
| (requires expensive engineers)? I mean the skill is not as
| expensive - many companies do this at scale.
| scarface_74 wrote:
| You need a domain expert either way. I mentioned in another
| reply that one of my niches is implementing call centers
| with Amazon Connect and Amazon Lex (the NLP engine).
|
| https://news.ycombinator.com/item?id=42748189
|
| I don't know the domain beforehand they are working in, I
| do validation testing with them.
| segmondy wrote:
| The same way you validate it if you didn't use an LLM.
| bugglebeetle wrote:
| The same way you check performance for any problem like
| this: by creating one or more manually-labeled test
| datasets, randomly sampled from the target data and looking
| at the resulting precision, recall, f-scores etc. LLMs
| change pretty much nothing about evaluation for most NLP
| tasks.
| axegon_ wrote:
| Yeah... Let's talk time needed for 10M prompts and how that
| fits into a daily pipeline. Enlighten us, please.
| FloorEgg wrote:
| Run them all in parallel with a cloud function in less than
| a minute?
| hnfong wrote:
| Obviously all the LLM API providers have a rate limit.
| Not a fan of GP's sarcastic tone, but I suppose many of
| us would like to know roughly what that limit would be
| for a small business using such APIs.
| simonw wrote:
| Surprisingly, DeepSeek doesn't have a rate limit:
| https://api-docs.deepseek.com/quick_start/rate_limit
|
| I've heard from people running 100+ prompts in parallel
| against it.
| jdietrich wrote:
| The rate limits for Gemini 1.5 Flash are 2000 requests
| per minute and 4 million tokens per minute. Higher limits
| are available on request.
|
| https://ai.google.dev/pricing#1_5flash
|
| 4o-mini's rate limits scale based on your account
| history, from 500RPM/200,000TPM to
| 30,000RPM/150,000,000TPM.
|
| https://platform.openai.com/docs/guides/rate-limits
| rlt wrote:
| Also can't you just combine multiple classification
| requests into a single prompt?
| axegon_ wrote:
| Yes, how did I not think of throwing more money at cloud
| providers on top of feeding open ai, when I could have
| just code a simple binary classifier and run everything
| on something as insignificant as an 8-th geh, quad core
| i5....
| simonw wrote:
| You might be able to use an even cheaper model. Google Gemini
| 1.5 Flash 8B is Input: $0.04 / Output: $0.15 per 1M tokens.
|
| 17 input tokens and 2 output tokens * 10 million jobs =
| 170,000,000 input tokens, 20,000,000 output tokens... which
| costs a total of $6.38 https://tools.simonwillison.net/llm-
| prices
|
| As for rate limits, https://ai.google.dev/pricing#1_5flash-8B
| says 4,000 requests per minute and 4 million tokens per
| minute - so you could run those 10 million jobs in about 2500
| minutes or 42 hours. I imagine you could pull a trick like
| sending 10 items in a single prompt to help speed that up,
| but you'd have to test carefully to check the accuracy
| effects of doing that.
| w10-1 wrote:
| The question is not average cost but marginal cost of quality
| - same as voice recognition, which had relatively low uptake
| even at ~2-4% error rates due to context switching costs for
| error correction.
|
| So you'd have to account for the work of catching the residue
| of 2-8%+ error from LLMs. I believe the premise is for NLP,
| that's just incremental work, but for LLM's that could be
| impossible to correct (i.e., cost per next-percentage-
| correction explodes), for lack of easily controllable (or
| even understandable) models.
|
| But it's most rational in business to focus on the easy
| majority with lower costs, and ignore hard parts that don't
| lead to dramatically larger TAM.
| gf000 wrote:
| I am absolutely not an expert in NLP, but I wouldn't be
| surprised if for many kinds of problems LLMs would have far
| _less_ error rate, than any NLP software.
|
| Like, lemmation is pretty damn dumb in NLP, while a better
| LLM model will be orders of magnitude more correct.
| WhitneyLand wrote:
| For context, 10M would cost ~$27.
|
| Say Gemini Flash 8B, allowing ~28 tokens for prompt input at
| $0.075/1M tokens, plus 2 output tokens at $0.30/1M. Works out
| to $0.0027 per classification. Or in other words, for 1 penny
| you could do this classification 3.7 times.
| segmondy wrote:
| You are not pushing it at 100. I can classify "Is 20 bottles of
| ferric chloride' a service or product in probably 2 seconds
| with a 4090. Something that most people don't realize is you
| can run multiple inference. So with something like a 4090, some
| solid few shots, and instead of having it classify one example
| at a time, you can do 5. We can probably run 100 parallel
| inference at 5 at a time. For about a rate of 250 a second on a
| 4090. So in 11 hours I'll be done. I'm going with a 7-8B model
| too. Some of the 1.5-3B models are great and will even run
| faster. Take a competent developer who knows python and how to
| use an OpenAI compatible API, they can put this together in
| 10-15 minutes, with no data science/scikit learn or other NLP
| toolchain experience.
|
| So for personal, medium or even large workloads, I think it has
| killed it. It needs to be extremely large. If you are
| classifying or segmenting comments on a social media platform
| were you need to deal with billions a day, then LLM would be a
| very inefficient approach, but for 90+% of use cases. I think
| it wins.
|
| I'm assuming you are going to run it locally because everyone
| is paranoid about their data. It's even cheaper if you use a
| cloud API.
| mikeocool wrote:
| If you have to classify user input as they're inputting it to
| provide a response -- so it can't be batched - 2 seconds
| could potentially be really slow.
|
| Though LLMs sure have made creating training data to train
| old school models for those cases a lot easier.
| WildGreenLeave wrote:
| Correct me if I'm wrong, but, if you run multiple inferences
| at the same time on the same GPU you will need load multiple
| models in the vram and the models will fight for resources
| right? So running 10 parallel inferences will slow everything
| down 5 times right? Or am I missing something?
| aeternum wrote:
| No, the key is to use the full context window so you
| structure the prompt as something like: For each line
| below, repeat the line, add a comma then output whether it
| most closely represents a product or service:
|
| 20 bottles of ferric chloride
|
| salesforce
|
| ...
| e12e wrote:
| Appreciate the concrete advice in this response. Thank
| you.
| Palmik wrote:
| Inference for single example is memory bound. By doing
| batch inference, you can interleave computation with memory
| loads, without losing much speed (up until you cross the
| compute bound threshold).
| bavell wrote:
| You will most likely be using the same model so just 1 to
| load into vram.
| axegon_ wrote:
| FFS... "Lots of writers, few readers". Read again and do the
| math: 2 seconds, multiply that by 10 million records which
| contain this, as well as "alarm installation in two
| locations" and a whole bunch of other crap with little to no
| repetition (<2%) and where does that get you? 2 * 10,000,000
| = 20,000,000 SECONDS!!!! A day has 86,400 seconds (24 * 3600
| = 86,400). The data pipeline needs to finish in <24 hours.
| Everyone needs to get this into their heads somehow: LLM's
| are not a silver bullet. They will not cure cancer anytime
| soon, nor will they be effective or cheap enough to run at
| massive scale. And I don't mean cheap as in "oh, just get
| openai subscription hurr durr". Throwing money mindlessly
| into something is never an effective way to solve a problem.
| gbnwl wrote:
| Why are you using 2 seconds? The commenter you are
| responding to hypothesized being able to do 250/s based on
| "100 parallel inference at 5 at a time". Not speaking to
| the validity of that, but find it strange that you ran with
| the 2 seconds number after seemingly having stopped reading
| after that line, while yourself lamenting people don't read
| and telling them to "read again".
| jazzyjackson wrote:
| OP said 2 seconds as if that wasn't an eternity...
| gbnwl wrote:
| But then they said 250/second when running multiple
| inference? Again I don't know if their assertions about
| running multiple inference are correct but why focus on
| the wrong number instead of addressing the actual claim?
| axegon_ wrote:
| Ok, let me dumb it down for you: you have a cockroach in
| your bathroom and you want to kill it. You have an RPG
| and you have a slipper. Are you gonna use the RPG or are
| you going to use the slipper? Even if your bathroom is
| fine after getting shot with an RPG somehow, isn't this
| an overkill? If you can code and binary classifier train
| a classifier in 2 hours that uses nearly 0 resources and
| gives you good enough results(in my case way above what
| my targets were) without having to use a ton of
| resources, libraries, rags, hardware and hell, even
| electricity? I mean how hard is this to comprehend
| really?
|
| https://deviq.com/antipatterns/shiny-toy
| why_only_15 wrote:
| Assuming the 10M records is ~2000M input tokens + 200M
| output tokens, this would cost $300 to classify using
| llama-3.3-70b[1]. If using llama lets you do this in say
| one day instead of two days for a traditional NLP pipeline,
| it's worthwhile.
|
| [1]: https://openrouter.ai/meta-
| llama/llama-3.3-70b-instruct
| simonw wrote:
| What NLP approaches are you using to solve the "is '20 bottles
| of ferric chloride' a service or a product?" problem?
| crystal_revenge wrote:
| > LLMs are not feasible when you have a dataset of 10 million
| items that you need to classify relatively fast and at a
| reasonable cost.
|
| What? That's simply not true.
|
| Current embedding models are incredibly fast and cheap and
| will, in the vast majority of NLP tasks, get you far better
| results than any local set of features you can develop
| yourself.
|
| I've also done this at work numerous times, and have been
| working on various NLP tasks for over a decade now. For all
| future traditional NLP tasks the first pass is going to be to
| get fetch LLM embeddings and stick on a fairly simple
| classification model.
|
| > One prompt? Fair. 10? Still ok. 100? You're pushing it. 10M -
| get help.
|
| "Prompting" is _not_ how you use LLMs for classification tasks.
| Sure you can build 0-shot classifiers for some tricky tasks,
| but if you 're doing classification for documents today and
| you're _not_ starting with an embedding model you 're missing
| some easy gains.
| gf000 wrote:
| Why not just run a local LLM for practically free? You can even
| trivially parallelize it with multiple instances.
|
| I would believe that many NLP problems can be easily solved
| even by smaller LLM models.
| sireat wrote:
| So what would you use to classify whether a document is a
| critique or something else in 1M documents in a non-English
| language?
|
| This is a real problem I am dealing with at a library project.
|
| Each document is between 100 to 10k tokens.
|
| Most top (read most expensive) LLMs available in OpenRouter
| work great, it is the cost (and speed) that is the issue.
|
| If I could come up with something locally runnable that would
| be fantastic.
|
| Presumably BERT based classifiers would work if I had one
| properly trained for the language.
| rahimnathwani wrote:
| I guess you've already seen https://huggingface.co/collection
| s/answerdotai/modernbert-67... ?
| scarface_74 wrote:
| For my use case, definitely.
|
| I have worked on AWS Connect (online call center) and Amazon Lex
| (the backing NLP engine) projects.
|
| Before LLMs, it was a tedious process of trying to figure out all
| of the different "utterances" that people could say and the
| various languages you had to support. With LLMs, it's just
| prompting
|
| https://chatgpt.com/share/678bab08-f3a0-8010-82e0-32cff9c0b4...
|
| I used something like this using Amazon Bedrock and a Lambda hook
| for Amazon Lex. Of course it wasn't booking a flight. It was
| another system
|
| The above is a simplified version. In the real world , I gave it
| a list of intents (book flights, reserve a room, rent a car) and
| properties - "slots" - I needed for each intent.
| gtirloni wrote:
| How about the the costs?
| scarface_74 wrote:
| We measure savings in terms of call deflections. Clients we
| work with say that each time a customer talks to an agent it
| costs $2-$5. That's not even taking into account call
| abandonments
| IanCal wrote:
| My base thing while advising people is that if anyone you
| pay needs to read the output, or you are directly replacing
| any kind of work then even frontier llm model inference
| costs are irrelevant. Of course you need to work out of
| that's truly the case but people worry about the cost in
| places where it's just irrelevant. If it's $2 when you get
| to an agent, each case that's avoided there could pay for
| around a million words read/generated. That's expensive
| compared to most API calls but irrelevant when counting
| human costs.
| elicksaur wrote:
| Thank you for sharing an actual prompt thread. So much of the
| LLM debate is washed in biases, and it is very helpful to share
| concrete examples of outputs.
| scarface_74 wrote:
| The "cordele GA" example surprised me. I was expecting to get
| a value of "null" for the airport code since I knew that city
| had a population of 12K and no airport within its
| metropolitan statistical area. It returned an airport that
| was close.
|
| Having world knowledge is a godsend. I also just tried a
| prompt with "Alpharetta, GA" a city north of Atlanta and it
| returned ATL. An NLP could never do that without a lot more
| work.
| LeafItAlone wrote:
| That's a great example and I understand it was intentionally
| simple but highlighted how LLMs need care with use. Not that
| this example is very related to NLP:
|
| My prompt: `<<I want a flight from portland to cuba after
| easter>>`
|
| The response: ``` { "origin": ["PDX"], "destination": ["HAV"],
| "date": "2025-04-01", "departure_time": null, "preferences":
| null } ```
|
| Of course I meant Portland Maine (PWM), there is more than one
| airport option in Cuba than HAV, and it got the date wrong,
| since Easter is April 20 this year.
| scarface_74 wrote:
| If the business stakeholders came out with that scenario, I
| would modify the prompt like this. You would know the users
| address if they had an account.
|
| https://chatgpt.com/share/678c1708-639c-8010-a6be-9ce1055703.
| ..
| LeafItAlone wrote:
| OK, but that only fixed one of the three issues.
| scarface_74 wrote:
| While the first one is easy. I mean you _could_ give it a
| list of holidays and dates. But the rest you would just
| ask the user to confirm the information and say "is this
| correct"? If they say "No" ask them which isn't correct
| and let them correct it.
|
| I would definitely assume someone wanted to leave from an
| airport close by if they didn't say anything.
|
| You don't want the prompt to grow too much. But you do
| have analytics that you can use to improve your prompt.
|
| In the case of Connect, you define your logic using a GUi
| flowchart builder called a contact flow.
|
| BTW: with my new prompt, it did assume the correct
| airport "<<I want to go to Cuba after Easter>>"
| LeafItAlone wrote:
| Sure, all the problems are "easy" once you identify them.
| As with most products. But the majority of Show HN posts
| here relying on LLMs that I see don't account for simple
| things like my example. Flights finders in particular
| have been pretty bad.
|
| >BTW: with my new prompt, it did assume the correct
| airport "<<I want to go to Cuba after Easter>>"
|
| Not really. It chose the airport you put basically in the
| prompt. But I don't live in MA, I live closer to PDX. And
| it didn't suggest the multiple other Cuba airports. So
| you'll end up with a lot of guiding rules.
| scarface_74 wrote:
| A human would assume if you said "Portland" they would
| first assume you meant PDX unless they looked up your
| address and then they would assume Maine.
|
| Just like if I said I wanted to fly to Albany, they would
| think I meant New York and not my parents city in south
| GA (ABY) which only has three commercial flights a day.
|
| Even with a human agent, you ask for confirmation.
|
| Also, I ask to speak to people on the ground - in this
| case it would be CSRs - to break it.
|
| That's another reason I think "side projects" are useless
| and they don't have any merit on resumes. I want them to
| talk about real world implementations.
| vpribish wrote:
| link is a 404, sadly. what did it say before?
| scarface_74 wrote:
| The link works for me even in cognito mode.
|
| The prompt:
|
| _you are a chatbot that helps users book flights. Please
| extract the origin city, destination city, travel date, and
| any additional preferences (e.g., time of day, class of
| service). If any of the details are missing, make the value
| "null". If the date is relative (e.g., "tomorrow", "next
| week"), convert it to a specific date.
|
| User Input: "<User's Query>"
|
| Output (JSON format): { "origin": list of airport codes
| "destination": list of airport codes, "date": "<Extracted
| Date>", "departure_time": "<Extracted Departure Time (if
| applicable)>", "preferences": "<Any additional preferences
| like class of service (optional)>" }
|
| The users request will be surrounded by <<>>
|
| Always return JSON with any missing properties having a value
| of null. Always return English. Return a list of airport
| codes for the city. For instance New York has two airports
| give both
|
| Always return responses in English_
| DebtDeflation wrote:
| The question seems malformed to me.
|
| Text classification, clustering, named entity recognition, etc.
| are NLP tasks. LLMs can perform these tasks. ML models that are
| not LLMs (or even not deep learning models) can also perform
| these tasks. Is the author perhaps asking if the concept of a
| "completion" has replaced all of these tasks?
|
| When I hear "traditional NLP" I think not of the above types of
| tasks but rather the methodology employed for performing them.
| For example, building a pipeline to do stemming/lemmatization,
| part of speech tagging, coreference resolution, etc. before the
| text gets fed to a classifier model. This was SOTA 10 years ago
| but I don't think many people are still doing it today.
| retinaros wrote:
| cant read the article. do they consider BERT as an LLM? there are
| tasks still in NLP where BERT is better than a GPT like
| selimthegrim wrote:
| Like?
| Olfurm wrote:
| Like named entity recognition or relations recognition. Check
| https://github.com/urchade/GLiNER
| darepublic wrote:
| I remember using the open NLP library from Stanford around 2016.
| It would do parts of speech tagging of words in a sentence
| (labelling the words with their grammatical function). It was
| pretty good but reliably failed on certain words where context
| determined the tag. When for gpt 3 came out the first thing I
| tested it out on was parts of speech tagging. In particular those
| sentences open NLP had trouble with. And it aced everything I was
| impressed.
| leobg wrote:
| There are AI bros that will call an LLM to do what you could do
| with a regex. I've seen people do the chunking for RAG using an
| LLM...
| tossandthrow wrote:
| If you think about chunking as "take x characters" then using
| LLMs is a poor idea.
|
| But syntactic chunking also works really poorly for any serious
| application as you loose basically all context.
|
| _Semantic_ chunking, however, is a task you absolutely would
| use LLMs for.
| freefaler wrote:
| If archive links aren't working this works:
|
| https://freedium.cfd/https://medium.com/altitudehq/is-tradit...
| vedant wrote:
| The title of this article feels like "has electricity killed oil
| lamps"?
| itissid wrote:
| LLM Design/Use has only about as much to with engineering as
| building a plane has to do with actually flying it.
|
| Every business is kind of a unicorn in its problems NLP is a
| small part of it. Like even if it did perform cheaply enough to
| do NLP, how would you replace parts like: 1. Evaluation system
| that uses Calibration(Human labels) 2. Ground Truth
| Collection(Human + sometimes semi automated) 3. QA testing by end
| users.
|
| Even if LLMs made it easier to do NLP there are correlations with
| the above which means your NLP process is hugely influenced so
| much that you still need an engineer. If you have an engineer who
| only for doing NLP and nothing else you are quite hyper
| specialized like to the extent you are only building planes
| 0.01%: of the engineering work out there.
| thangalin wrote:
| I created an NLP library to help curl straight quotes into curly
| quotes. Last I checked, LLMs struggled to curl the following
| straight quotation marks: ''E's got a 'ittle
| box 'n a big 'un,' she said, 'wit' th' 'ittle 'un 'bout 2'x6".
| An' no, y'ain't cryin' on th' "soap box" to me no mo, y'hear.
| 'Cause it 'tweren't ever a spec o' fun!' I says to my frien'.
|
| The library is integrated into my Markdown editor, KeenWrite
| (https://keenwrite.com/), to correctly curl quotation marks into
| entities before passing them over to ConTeXt for typesetting.
| While there are other ways to indicate opening and closing
| quotation marks, none are as natural to type in plain text as
| straight quotes. I would not trust an LLM curl quotation marks
| accurately.
|
| For the curious, you can try it at:
|
| https://whitemagicsoftware.com/keenquotes/
|
| If you find any edge cases that don't work, do let me know. The
| library correctly curls my entire novel. There are a few edge
| cases that are completely ambiguous, however, that require
| semantic knowledge (part-of-speech tagging), which I haven't
| added. PoS tagging would be a heavy operation that could prevent
| real-time quote curling for little practical gain.
|
| The lexer, parser, and test cases are all open source.
|
| https://gitlab.com/DaveJarvis/KeenQuotes/-/tree/main/src/mai...
| jcheng wrote:
| Great example. I just tried it with a few LLMs and got horrible
| results. GPT-4o got a ton of them wrong, GPT-1o got them all
| correct AFAICT but took 1m50s to do so, and Claude 3.5 Sonnet
| said "Here's the text with straight quotes converted to curly
| quotes" but then returned the text with all the straight quotes
| intact.
|
| I'm very surprised all three models didn't nail it immediately.
| gf000 wrote:
| I would be interested how well would even a smaller LLM model
| work after fine tuning. Besides the overhead of an LLM, I would
| assume they would do a much better job at it in the edge cases
| (where contextual understanding is required).
| derbaum wrote:
| One of the things I'm still struggling with when using LLMs over
| NLP is classification against a large corpus of data. If I get a
| new text and I want to find the most similar text out of a
| million others, semantically speaking, how would I do this with
| an LLM? Apart from choosing certain pre-defined categories (such
| as "friendly", "political", ...) and then letting the LLM rate
| each text on each category, I can't see a simple solution yet
| except using embeddings (which I think could just be done using
| BERT and does not count as LLM usage?).
| thaumasiotes wrote:
| Take two documents.
|
| Feed one through an LLM, one word at a time, and keep track of
| words that experience greatly inflated probabilities of
| occurrence, compared to baseline English. "For" is probably
| going to maintain a level of likelihood close to baseline.
| "Engine" is not.
|
| Do the same thing for the other one.
|
| See how much overlap you get.
| derbaum wrote:
| Wouldn't a simple comparison of the word frequency in my text
| against a list of usual word frequencies do the trick here
| without an LLM? Sort of a BM25?
| macNchz wrote:
| I've used embeddings to define clusters, then passed sampled
| documents from each cluster to an LLM to create labels for each
| grouping. I had pretty impressive results from this approach
| when creating a category/subcategory labels for a collection of
| texts I worked on recently.
| derbaum wrote:
| That's interesting, it sounds a bit like those cluster graph
| visualisation techniques. Unfortunately, my texts seem to
| fall into clusters that really don't match the ones that I
| had hoped to get out of these methods. I guess it's just a
| matter of fine-tuning now.
| vletal wrote:
| The idea that we can solve "language" by breaking down and
| understanding sentences is naive and funny with the benefit of
| hindsight, is it not?
|
| An equivalently funny attitude seems to be the "natural language
| will replace programming languages". Let's see how that one will
| work out when the hype is over.
___________________________________________________________________
(page generated 2025-01-18 23:01 UTC)