[HN Gopher] What we've learned from a year of building with LLMs
___________________________________________________________________
What we've learned from a year of building with LLMs
Author : ViktorasJucikas
Score : 383 points
Date : 2024-05-31 12:33 UTC (2 days ago)
(HTM) web link (eugeneyan.com)
(TXT) w3m dump (eugeneyan.com)
| solidasparagus wrote:
| No offense, but I'd love to see what they've successfully built
| using LLMs before taking their advice too seriously. The idea
| that fine-tuning isn't even a consideration (perhaps even
| something they think is absolutely incorrect if the section
| titles of the unfinished section is anything to go by) is very
| strange to me and suggests a pretty narrow perspective IMO
| gandalfgeek wrote:
| This was kind of conventional wisdom ("fine tune only when
| absolutely necessary for your domain", "fine-tuning hurts
| factuality"), but some recent research (some of which they
| cite) has actually quantitatively shown that RAG is much
| preferable to FT for adding domain-specific knowledge to an
| LLM:
|
| - "Does Fine-Tuning LLMs on New Knowledge Encourage
| Hallucinations?" https://arxiv.org/abs//2405.05904
|
| - "Fine-Tuning or Retrieval? Comparing Knowledge Injection in
| LLMs" https://arxiv.org/abs/2312.05934
| solidasparagus wrote:
| Thanks, I'll read those more fully.
|
| But "knowledge injection" is still pretty narrow to me.
| Here's an example of a very simple but extremely valuable
| usecase - taking a model that was trained on language+code
| and finetuning it on a text-to-DSL task, where the DSL is a
| custom one you created (and thus isn't in the training data).
| I would consider that close to infeasible if your only tool
| is a RAG hammer, but it's a very powerful way to leverage
| LLMs.
| gandalfgeek wrote:
| Agree that your use-case is different. The papers above are
| dealing mostly with adding a domain-specific _textual_
| corpus, still answering questions in prose.
|
| "Teaching" the LLM an entirely new language (like a DSL)
| might actually need fine-tuning, but you can probably build
| a pretty decent first-cut of your system with n-shot
| prompts, then fine-tune to get the accuracy higher.
| yoelhacks wrote:
| This is exactly (one of) our use cases at Eraser - taking
| code or natural language and producing diagram-as-code DSL.
|
| As with other situations that want a custom DSL, our syntax
| has its own quirks and details, but is similar enough to
| e.g. Mermaid that we are able to produce valid syntax
| pretty easily.
|
| What we've found harder is controlling for edge cases about
| how to build proper diagrams.
|
| For more context: https://www.eraser.io/decision-node/on-
| building-with-ai
| CuriouslyC wrote:
| Fine tuning has been on the way out for a while. It's hard to
| do right and costly. LoRAs are better for influencing output
| style as they don't dumb down the model, and they're easier to
| create. This is on top of RAG just being better for new facts
| like the other reply mentioned.
| solidasparagus wrote:
| How much of that is just the flood of traditional engineers
| into the space and the fact that collecting data and then
| fine-tuning models is orders of magnitude more complex than
| just throwing in RAG? I suspect a huge amount of RAG's
| popularity is just that any engineer can do a version of it +
| ChatGPT API calls in a day.
|
| As for lora - in the context of my comment, that's just
| splitting hairs IMO. It falls in the category of finetuning
| for me, although I understand why you might disagree. But
| it's not like the article mentions lora either, nor am I
| aware of people doing lora without GPUs which the article is
| against (No GPUs before PMF)
| altdataseller wrote:
| I disagree. No amount of fine tuning will ever give the LLM
| the relevant context with which to answer my question.
| Maybe if your context is a static Wikipedia or something
| that will never change, you can fine tune it. But if your
| data and docs keep changing, how is fine tuning going to be
| better than RAG?
| solidasparagus wrote:
| Continuous retraining and deployment maybe? But I'm
| actually not anti-RAG (although I think it is overrated
| because the retrieval problem is still handled extremely
| naively), I just think that fine-tuning should _also_ be
| in your toolkit.
| altdataseller wrote:
| Why is the retrieval part overrated? There isnt even a
| single way to retrieve. It could be a simple keyword
| sesrch, a vector sesrch, a combo, or just simply
| retrieving a single doc and stuffing it in the context
| solidasparagus wrote:
| People will disagree, but my problem with retrieval is
| that every technique that is popular uses one-hop
| thinking - you retrieve information that is directly
| related to the prompt using old-school techniques (even
| though the embeddings are new, text similarity is old).
| LLMs are most powerful, IMO, at horizontal thinking.
| Building a prompt using one-hop narrow AI techniques and
| then feeding it into a powerful generally capable model
| is like building a drone but only letting it fly over
| streets that already exist - not worthless, but only
| using a fraction of the technology's power.
|
| A concrete example is something like a tool for querying
| an internal company wiki and the query "tell me about the
| Backend Team's approach to sprint planning". Normal
| retrieval approaches will pull information directly
| related to that query. But what if there is no
| information about the backend team's practices? As a
| human, you would do multi-hop/horizontal information
| extraction - you would retrieve information about who
| makes up the backend team, you would then retrieve
| information about them and their backgrounds/practices.
| You might might have a hypothesis that people carry over
| their practices from previous experiences, so you look at
| the previous teams and their practices. Then you would
| have the context necessary to give a good answer. I don't
| know of many people implementing RAG like that. And what
| I described is 100% possible for AI to do today.
|
| Techniques that would get around this like iterative
| retrieval or retrieval-as-a-tool don't seem popular.
| altdataseller wrote:
| People cant do that because of cost. If every single
| query involved taking everything even remotely related to
| the query, and passing it to OpenAI, it would get
| expensive very very fast.
|
| Its not a technical issue, its a practicality issue imo.
| idf00 wrote:
| Luckily it's not one or the other. You can fine tune and
| use RAG.
|
| Sometimes RAG is enough. Sometimes fine tuning on top of
| RAG is better. It depends on the use case. I can't think
| of any examples where you would want to fine tune and not
| use rag as well.
|
| Sometimes you fine tune a small model so it performs
| close to a larger varient on that specific narrow task
| and you improve inference performance by using a smaller
| model.
| phillipcarter wrote:
| I don't see why this is seen as an either-or by people? Fine-
| tuning doesn't eliminate the need for RAG, and RAG doesn't
| obviate the need for fine-tuning either.
|
| Note that their guidance here is quite practical:
|
| > If prompting gets you 90% of the way there, then fine-
| tuning may not be worth the investment.
| OutOfHere wrote:
| Fine-tuning is an absolutely necessary for true AI, and even if
| it's desirable, it's unfeasible to do for now for any large
| model considering how expensive GPUs are. If I had infinite
| money, I'd throw it at continuous fine-tuning and would throw
| away the RAG. Fine-tuning also requires appropriate measures to
| prevent forgetting of older concepts.
| solidasparagus wrote:
| It is not unfeasible. It is absolutely realistic to do
| distributed finetuning of an 8B text model on previous
| generation hardware. You can add finetuning to your set of
| options for about the cost of one FTE - up to you whether
| that tradeoff is worth it, but in many places it is. The
| expertise to pull it off is expensive, but to get a mid-level
| AI SME capable of helping a company adopt finetuning, you are
| only going to pay about the equivalent of 1-3 senior
| engineers.
|
| Expensive? Sure, all of AI is crazy expensive. Unfeasible? No
| OutOfHere wrote:
| I don't consider a small 8B model to be worth fine-tuning.
| Fine-tuning is worthwhile when you have a larger model with
| capacity to add data, perhaps one that can even grow its
| layers with the data. In contrast, fine-tuning a small
| saturated model will easily cause it to forget older
| information.
|
| All things considered, in relative terms, as much as I
| think fine-tuning would be nice, it will remain
| significantly more expensive than just making RAG or search
| calls. I say this while being a fan of fine-tuning.
| solidasparagus wrote:
| > I don't consider a small 8B model to be worth fine-
| tuning.
|
| Going to have to disagree with you on that one. A modern
| 8B model that has been trained on enough tokens is
| ridiculously powerful.
| OutOfHere wrote:
| A well-trained 8B model will already be over-saturated
| with information from the start. It will therefore easily
| forget much old information when fine-tuning it with new
| materials. It just doesn't have the capacity to take in
| too much information.
|
| Don't get me wrong. I think an 70B or larger model would
| be worth fine-tuning, especially if it can be grown
| further with more layers.
| solidasparagus wrote:
| > A well-trained 8B model will already be over-saturated
| with information from the start
|
| Any evidence of that that I can look at? This doesn't
| match what I've seen nor have I heard this from the
| world-class researchers I have worked with. Would be
| interested to learn more.
| OutOfHere wrote:
| Upon further thought, if fine-tuning involves adding
| layers, then the initial saturation should not matter.
| Let's say if an 8B model adds 0.8*2 = 1.6B of new layers
| for fine-tuning, then with some assumptions, a ballpark
| is that this could be good for 16 million articles for
| fine-tuning.
| robrenaud wrote:
| The reason to fine tune is to get a model that performs
| well on a specific task. It could lose 90 percent of it's
| knowledge and beat the unturned model at the narrow task
| at hand. That's the point, no?
| OutOfHere wrote:
| It is not really possible to lose 90% of one's brain and
| do well on certain narrow tasks. If the tasks truly were
| so narrow, you would be better off training a small model
| just for them from scratch.
| lmeyerov wrote:
| We work in some pretty serious domains and try to stay away
| from fine tuning:
|
| - Most of our accuracy ROI is from agentic loops over top
| models, and dynamic RAG example injection goes far here that
| the relative lift of adding fine-tuning isn't worth the many
| costs
|
| - A lot of fine-tuning is for OSS models that do worse than
| agentic loops over the proprietary GPT4/Opus3
|
| - For distribution, it's a lot easier to deploy for pluggable
| top APIs without requiring fine-tuning, e.g., "connect to your
| gpt4/opus3 + for dumber-but-bigger tasks, groq"
|
| - The resources we could put into fine-tuning are better spent
| on RAG, agentic loops, prompts/evals, etc
|
| We do use tuned smaller dumber models, such as part of a coarse
| relevancy filter in a firehose pipeline... but these are
| outliers. Likewise, we expect to be using them more... but
| again, for rarer cases and only after we've exhausted other
| stuff. I'm guessing as we do more fine-tuning, it'll be more on
| embeddings than LLMs, at least until OSS models get a lot
| better.
| solidasparagus wrote:
| See if the article said this, I would have agreed - fine-
| tuning is a tool and it should be used thoughtfully. Although
| I personally believe that in this funding climate it makes
| sense to make data collection and model training a core
| capability of any AI product. However that will only be
| available and wise for some founders.
| lmeyerov wrote:
| Agreed, model training and data collection are great!
|
| The subtle bit is just doesn't have to be for LLMs, as
| these are typically part of a system-of-models. E.g., we <3
| RAG, and GNNs for improving your KG is fascinating.
| Likewise, dspy's explorations in optimizing prompts, vs
| LLMs, is very cool.
| solidasparagus wrote:
| > we <3 RAG, and GNNs for improving your KG is
| fascinating
|
| Oh man I am so torn between this being a fantastic idea
| and this being "building a better slide-rule in the age
| of the computer".
|
| dspy is definitely a project I want to dig into more
| lmeyerov wrote:
| Yeah I would recommend sticking to RAG on naively chunked
| data for weekend projects by 1 person. Likewise, a
| consumer tool like perplexity's search engine where you
| minimize spend per user task or go bankrupt, same thing,
| do the cheap thing and move on, good enough
|
| Once RAG projects become important and good answers
| matter - we work with governments, manufacturers, banks,
| cyber teams, etc - working through data quality, data
| representation, & retrieval quality helps
|
| Note that we didn't start here: We began with naive RAG,
| then relevancy filtering, then agentic & neurosymbolic
| querying, then dynamic example prompt injection, and now
| are getting into cleaning up the database/kg itself
|
| For folks doing investigative/analytics projects in this
| space, happy to chat about what we are doing w Louie.AI.
| These are more implementation details we don't normally
| write about.
| tarasglek wrote:
| Can you give a concrete example of GNNs helping?
| lmeyerov wrote:
| Entity resolution - RAG often mixes vector & symbolic
| queries, and ER improves reverse indexing, which is a
| starting point for a lot of the symbolic ones
|
| Identifying misinfo - Ranking & summarization based on
| internet data should be a lot more careful, and sometimes
| the controversy is the interesting part
|
| For both, GNNs are generally SOTA
| qeternity wrote:
| Have you actually used DSPy? I still can't figure out
| what it's useful for beyond optimizing basic few shot
| prompts.
| jph00 wrote:
| > _The idea that fine-tuning isn 't even a consideration
| (perhaps even something they think is absolutely incorrect if
| the section titles of the unfinished section is anything to go
| by) is very strange to me and suggests a pretty narrow
| perspective IMO_
|
| The article has a section called "When to finetune", along with
| links to separate pages describing how to do so. They
| absolutely don't say that "fine-tuning isn't even a
| consideration". Instead, they describe the situations in which
| fine-tuning is likely to be helpful.
| solidasparagus wrote:
| Huh. Well that's embarrassing. I guess I missed it when I
| lost interest in the caching section and jumped straight to
| Evaluation and Monitoring.
| bbischof wrote:
| Hello, it's Bryan, an author on this piece.
|
| I'd you're interested in using one of the LLM-applications I
| have in prod, check out https://hex.tech/product/magic-ai/ It
| has a free limit every month to give it a try and see how you
| like it. If you have feedback after using it, we're always very
| interested to hear from users.
|
| As far as fine-tuning in particular, our consensus is that
| there are easier options first. I personally have fine-tuned
| gpt models since 2022; here's a silly post I wrote about it on
| gpt 2: https://wandb.ai/wandb/fc-bot/reports/Accelerating-ML-
| Conten...
| solidasparagus wrote:
| I took at look at Magic earlier today and it didn't work at
| all for me, sorry to say. After the example prompt, I tried
| to learn about a table and it generated bad SQL (correct
| query to pull a row, but with limit 0). I asked it to show me
| the DDL and it generated invalid SQL. Then I tried to ask it
| to do some population statistics on the customer table and
| ended up confused about why there appears to be two windows
| in the cell, with the previously generated SQL on the left
| and the newly generated SQL on the right. The new SQL
| wouldn't run when I hit run cell, the error showed the
| originally generated SQL. I gave up and bounced.
|
| I went back while writing this comment and realized it might
| be showing me a diff (better use of color would have helped,
| I have been trained by github). But I was at a loss for what
| to do with that. I just now figured out the Keep button
| exists and it accepted the diff and now it sort of makes
| sense, but the SQL still doesn't return any results.
|
| My honest feedback is that there is way too much stuff I
| don't understand on the screen and it makes me confused and a
| little stressed. Ease me into it please, I'm dumb. There
| seems to be cells that are linked together and cells that
| aren't(? separated by purplish background) and I don't
| understand it. I am a jupyter user and I feel like this
| should be intuitive to me, but it isn't. I am not a designer,
| but I suspect the structural markings like cell boundaries
| are too faint compared to the content of the cells and/or the
| exterior of a cell having the same color as the interior is
| making it hard for me. I feel lost in a sea of white.
|
| But the core issue is that, excluding the prompt I copy-
| pasted word for word which worked like a charm, I am 0 out of
| 4 on actually leveraging AI to solve the problems I asked of
| Magic. I like the concept of natural language BI (I worked on
| in the early days when Alexa came out) so I probably gave it
| more chances than I would have for a different product.
|
| For me, it doesn't fit my criteria for good problems to solve
| with AI in 2024 - the conversational interface and binary
| right/wrong nature of querying/presenting data accurately
| make the cost of failure too high, which is a death sentence
| for AI products IMO (compare to proactive, non-blocking
| products like copilot or shades-of-wrong problems like image
| generation or conversations with imaginary characters). But
| text-to-SQL and data presentation make sense as AI
| capabilities in 2024 so I can see why that could be a good
| product to pursue. If it worked, I would definitely use it.
| OutOfHere wrote:
| Almost all of this should flow from common-sense. I would use
| what makes sense for your application, and not worry about the
| rest. It's a toolbox, not a rulebook. The one point that comes
| more from experience than from common-sense is to always pin your
| model versions. As a final tip, if despite trying everything, you
| still don't like the LLM's output, just run it again!
|
| Here is a summary of all points:
|
| 1. Focus on Prompting Techniques: 1.1. Start
| with n-shot prompts to provide examples demonstrating tasks.
| 1.2. Use Chain-of-Thought (CoT) prompting for complex tasks,
| making instructions specific. 1.3. Incorporate relevant
| resources via Retrieval Augmented Generation (RAG).
|
| 2. Structure Inputs and Outputs: 2.1. Format
| inputs using serialization methods like XML, JSON, or Markdown.
| 2.2. Ensure outputs are structured to integrate seamlessly with
| downstream systems.
|
| 3. Simplify Prompts: 3.1. Break down complex
| prompts into smaller, focused ones. 3.2. Iterate and
| evaluate each prompt individually for better performance.
|
| 4. Optimize Context Tokens: 4.1. Minimize
| redundant or irrelevant context in prompts. 4.2. Structure
| the context clearly to emphasize relationships between parts.
|
| 5. Leverage Information Retrieval/RAG: 5.1. Use
| RAG to provide the LLM with knowledge to improve output.
| 5.2. Ensure retrieved documents are relevant, dense, and
| detailed. 5.3. Utilize hybrid search methods combining
| keyword and embedding-based retrieval.
|
| 6. Workflow Optimization: 6.1. Decompose tasks
| into multi-step workflows for better accuracy. 6.2.
| Prioritize deterministic execution for reliability and
| predictability. 6.3. Use caching to save costs and reduce
| latency.
|
| 7. Evaluation and Monitoring: 7.1. Create
| assertion-based unit tests using real input/output samples.
| 7.2. Use LLM-as-Judge for pairwise comparisons to evaluate
| outputs. 7.3. Regularly review LLM inputs and outputs for
| new patterns or issues.
|
| 8. Address Hallucinations and Guardrails: 8.1.
| Combine prompt engineering with factual inconsistency guardrails.
| 8.2. Use content moderation APIs and PII detection packages to
| filter outputs.
|
| 9. Operational Practices: 9.1. Regularly check
| for development-prod data skew. 9.2. Ensure data logging
| and review input/output samples daily. 9.3. Pin specific
| model versions to maintain consistency and avoid unexpected
| changes.
|
| 10. Team and Roles: 10.1. Educate and empower
| all team members to use AI technology. 10.2. Include
| designers early in the process to improve user experience and
| reframe user needs. 10.3. Ensure the right progression of
| roles and hire based on the specific phase of the project.
|
| 11. Risk Management: 11.1. Calibrate risk
| tolerance based on the use case and audience. 11.2. Focus
| on internal applications first to manage risk and gain confidence
| before expanding to customer-facing use cases.
| felixbraun wrote:
| related discussion (3 days ago):
| https://news.ycombinator.com/item?id=40508390
| DylanSp wrote:
| Looks like the same content that was posted on oreilly.com a
| couple days ago, just on a separate site. That has some existing
| discussion: https://news.ycombinator.com/item?id=40508390.
| Multicomp wrote:
| Anyone have a convenience solution for doing multi-step
| workflows? For example, I'm filling out the basics of an NPC
| character sheet on my game prep. I'm using a certain rule system,
| give the enemy certain tactics, certain stats, certain types of
| weapons, right now I have a 'god prompt' trying to walk the LLM
| through creating the basic character sheet, but the responses get
| squeezed down into what one or two prompt responses can be.
|
| If I can do node-red or a function chain for prompts and outputs,
| that would be sweet.
| CuriouslyC wrote:
| You can do multi shot workflows pretty easy, I like to have the
| model produce markdown, then add code blocks (```json/yaml```)
| to extract the interim results. You can lay out multiple
| "phases" in your prompt and have it perform each one in turn,
| and have each one reference prior phases. Then at the end you
| just pull out the code blocks for each phase and you have your
| structured result.
| mentos wrote:
| I still haven't played with using one LLM to oversee another.
|
| "You are in charge of game prep and must work with an LLM over
| many prompts to..."
| hugocbp wrote:
| For me, a very simple "breakdown tasks into a queue and store
| in a DB" solution has help tremendously with most requests.
|
| Instead of trying to do everything into a single chat or chain,
| add steps to ask the LLM to break down the next tasks, with
| context, and store that into SQLite or something. Then start
| new chats/chains on each of those tasks.
|
| Then just loop them back into LLM.
|
| I find that long chats or chains just confuse most models and
| we start seeing gibberish.
|
| Right now I'm favoring something like:
|
| "We're going to do task {task}. The current situation and
| context is {context}.
|
| Break down what individual steps we need to perform to achieve
| {goal} and output these steps with their necessary context as
| {standard_task_json}. If the output is already enough to
| satisfy {goal}, just output the result as text."
|
| I find that leaving everything to LLM in a sequence is not as
| effective as using LLM to break things down and having a DB and
| code logic to support the development of more complex outcomes.
| datameta wrote:
| Indeed! If I'm met with several misunderstandings in a row,
| asking it to explain what I'm trying to do is a pretty
| surefire way to move forward.
|
| Also mentioning what to "forget" or not focus on anymore
| seems to remove some noise from the responses if they are
| large.
| gpsx wrote:
| One option for doing this is to incrementally build up the
| "document" using isolated prompts for each section. I say
| document because I am not exactly sure what the character sheet
| looks like, but I am assuming it can be constructed one section
| at a time. You create a prompt to create the first section.
| Then, you create a second prompt that gives the agent your
| existing document and prompts it to create the next section.
| You continue until all the sections are finished. In some cases
| this works better than doing a single conversation.
| e1g wrote:
| Perplexity recently released something like this
| https://www.perplexity.ai/hub/blog/perplexity-pages
| proc0 wrote:
| Sounds like you need an agent system, some libs are mentioned
| here: https://lilianweng.github.io/posts/2023-06-23-agent/
| 127 wrote:
| Did you force it into a parser? You can define a simple
| language in llama.cpp for the LLM to obey.
| punkspider wrote:
| Perhaps this would be of use?
| https://github.com/langgenius/dify/ I use it for quick
| workflows and it's pretty intuitive.
| dbs wrote:
| Show me the use cases you have supported in production. Then I
| might read all the 30 pages praising the dozens (soon to be
| hundreds?) of "best practices" to build LLMs.
| joe_the_user wrote:
| I have a friend who uses ChatGPT for writing quick policy
| statement for her clients (mostly schools). I have a friend who
| uses it to create images and descriptions for DnD adventures.
| LLMs have uses.
|
| The problem I see is, who can an "application" be anything but
| a little window onto the base abilities of ChatGPT and so
| effectively offers nothing _more_ to an end-user. The final
| result still have to be checked and regular end-users have to
| do their own prompt.
|
| Edit: Also, I should also say that anyone who's designing LLM
| apps that, rather than being end-user tools, are effectively
| gate keepers to getting action or "a human" from a company
| deserves a big "f* you" 'cause that approach is evil.
| harrisoned wrote:
| It certainly has use cases, just not as many as the hype lead
| people to believe. For me:
|
| -Regex expressions: ChatGPT is the best multi-million regex
| parser to date.
|
| -Grammar and semantic check: It's a very good revision tool,
| helped me a lot of times, specially when writing in non-native
| languages.
|
| -Artwork inspiration: Not only for visual inspiration, in the
| case of image generators, but descriptive as well. The
| verbosity of some LLMs can help describe things in more detail
| than a person would.
|
| -General coding: While your mileage may vary on that one, it
| has helped me a lot at work building stuff on languages i'm not
| very familiar with. Just snippets, nothing big.
| int_19h wrote:
| GPT-4 has amazing translation capabilities, too. Actually
| usable for long conversations.
| thallium205 wrote:
| We have a company mail, fax, and phone room that receives
| thousands of pages a day that now sorts, categorizes, and
| extracts useful information from them all in a completely
| automated way by LLMs. Several FTEs have been reassigned
| elsewhere as a result.
| robbiemitchell wrote:
| Processing high volumes of unstructured data (text)... we're
| using a STAG architecture.
|
| - Generate targeted LLM micro summaries of every record
| (ticket, call, etc.) continually
|
| - Use layers of regex, semantic embeddings, and scoring
| enrichments to identify report rows (pivots on aggregates)
| worth attention, running on a schedule
|
| - Proactively explain each report row by identifying what's
| unusual about it and LLM summarizing a subset of the
| microsummaries.
|
| - Push the result to webhook
|
| Lack of JSON schema restriction is a significant barrier to
| entry on hooking LLMs up to a multi step process.
|
| Another is preventing LLMs from adding intro or conclusion
| text.
| BoorishBears wrote:
| > Lack of JSON schema restriction is a significant barrier to
| entry on hooking LLMs up to a multi step process.
|
| How are you struggling with this, let alone as a significant
| barrier? JSON adherence with a well thought out schema hasn't
| been a worry between improved model performance and various
| grammar based constraint systems in a while.
|
| > Another is preventing LLMs from adding intro or conclusion
| text.
|
| Also trivial to work around by pre-filling and stop tokens,
| or just extremely basic text parsing.
|
| Also would recommend writing out Stream-Triggered Augmented
| Generation since the term is so barely used it might as well
| be made up from the POV of someone trying to understand the
| comment
| robbiemitchell wrote:
| Asking even a top-notch LLM to output well formed JSON
| simply fails sometimes. And when you're running LLMs at
| high volume in the background, you can't use the best
| available until the last mile.
|
| You work around it with post-processing and retries. But
| it's still a bit brittle given how much stuff happens
| downstream without supervision.
| fancy_pantser wrote:
| Constrained output with GBNF or JSON is much more
| efficient and less error-prone. I hope nobody outside of
| hobby projects is still using error/retry loops.
| joatmon-snoo wrote:
| Constraining output means you don't get to use ChatGPT or
| Claude though, and now you have to run your own stuff.
| Maybe for some folks that's OK, but really annoying for
| others.
| fancy_pantser wrote:
| You're totally right, I'm in my own HPC bubble. The
| organizations I work with create their own models and
| it's easy for me to forget that's the exception more than
| the rule. I apologize for making too many assumptions in
| my previous comment.
| joatmon-snoo wrote:
| Not at all!
|
| Out of curiosity- do those orgs not find the loss of
| generality that comes from custom models to be an issue?
| e.g. vs using Llama or Mistral or some other open model?
| int_19h wrote:
| I do wonder why, though. Constraining output based on
| logits is a fairly simple and easy-to-implement idea, so
| why is this not part of e.g. the OpenAI API yet? They
| don't even have to expose it at the lowest level, just
| use it to force valid JSON in the output on their end.
| jncfhnb wrote:
| ... why would you have the LLM spit out a json rather
| than define the json yourself and have the LLM supply
| values?
| janpieterz wrote:
| How would I do this reliably? Eg give me 10 different
| values, all in one prompt for performance reasons?
|
| Might not need JSON but whatever format it outputs, it
| needs to be reliable.
| jncfhnb wrote:
| Don't do it all in one prompt.
| janpieterz wrote:
| Right, but now I'm basically running a huge performance
| hit, need to parallelize my queries etc.
|
| I was parsing a document recently, 10-ish questions for 1
| document, would make things expensive.
|
| Might be what's needed but not ideal.
| esafak wrote:
| If the LLM doesn't output data that conforms to a schema,
| you can't reliably parse it, so you're back to square
| one.
| jncfhnb wrote:
| It's significantly easier to output an integer than a
| JSON with a key value structure where the value is an
| integer and everything else is exactly as desired
| esafak wrote:
| That's because you've dumbed down the problem. If it was
| just about outputting one integer, there would be nothing
| to discuss. Now add a bunch more fields, add some nesting
| and other constraints into it...
| jncfhnb wrote:
| The more complexity you add the less likely the LLM is to
| give you a valid response in one shot. It's still going
| to be easier to get the LLM to supply values to a fixed
| scheme than to get the LLM to give the answers and the
| scheme
| neverokay wrote:
| Is there a general model that got fine tuned on these
| json schema/output pairs?
|
| Seems like it would be universally useful.
| yeahwhatever10 wrote:
| The phrase you want to search is "constrained decoding".
| BoorishBears wrote:
| The best available actually have the _fewest_ knobs for
| JSON schema enforcement (ie. OpenAI 's JSON mode, which
| technically can still produce incorrect JSON)
|
| If you're using anything less you should have a grammar
| that enforces exactly what tokens are allowed to be
| output. Fine Tuning can help too in case you're worried
| about the effects of constraining the generation, but in
| my experience it's not really a thing
| benreesman wrote:
| I only became aware of it recently and therefore haven't done
| more than play with in a fairly cursory way, but
| unstructured.io seems to have a lot of traction and certainly
| in my little toy tests their open-source stuff seems pretty
| clearly better than the status quo.
|
| Might be worth checking out.
| adamsbriscoe wrote:
| > Lack of JSON schema restriction is a significant barrier to
| entry on hooking LLMs up to a multi step process.
|
| (Plug) I shipped a dedicated OpenAI-compatible API for this,
| jsonmode.com a couple weeks ago and just integrated Groq
| (they were nice enough to bump up the rate limits) so it's
| crazy fast. It's a WIP but so far very comparable to JSON
| output from frontier models, with some bonus features (web
| crawling etc).
| tarasglek wrote:
| The metallica-esque lightning logo is cool
| joatmon-snoo wrote:
| We actually built an error-tolerant JSON parser to handle
| this. Our customers were reporting exactly the same issue-
| trying a bunch of different techniques to get more usefully
| structured data out.
|
| You can check it out over at
| https://github.com/BoundaryML/baml. Would love to talk if
| this is something that seems interesting!
| lastdong wrote:
| "Use layers of regex, semantic embeddings, and scoring
| enrichments to identify report rows (pivots on aggregates)
| worth attention, running on a schedule"
|
| This is really interesting, is there any architecture
| documentation/articles that you can recommend?
| fnordpiglet wrote:
| We use LLMs in dozens of different production applications for
| critical business flows. They allow for a lot of dynamism in
| our flows that aren't amenable to direct quantitative reasoning
| or structured workflows. Double digit percents of our growth in
| the last year are entirely due to them. The biggest challenge
| is tool chain, limits on inference capacity, and developer
| understanding of the abilities, limits, and techniques for
| using LLMs effectively.
|
| I often see these messages from the community doubting the
| reality, but LLMs are a powerful tool in the tool chest. But I
| think most companies are not staffed with skilled enough
| engineers with a creative enough bent to really take advantage
| of them yet or be willing to fund basic research and from first
| principles toolchain creation. That's ok. But it's foolish to
| assume this is all hype like crypto was. The parallels are
| obvious but the foundations are different.
| mvdtnz wrote:
| Yet another post claiming "dozens" of production use cases
| without listing a single one.
| fnordpiglet wrote:
| I've listed plenty in my comment history. I don't generally
| feel compelled to trot them all out all the time - I don't
| need to "prove" anything and if you think I'm lying that's
| your choice. Finally, many of our uses are trade secrets
| and a significant competitive advantage so I don't feel the
| need to disclose them to the world if our competitors don't
| believe in the tech. We can keep eating their lunch.
| threeseed wrote:
| No one is saying that all of AI is hype. It clearly isn't.
|
| But the facts are that today LLMs are not suitable for use
| cases that need accurate results. And there is no evidence or
| research that suggests this is changing anytime soon. Maybe
| for ever.
|
| There are very strong parallels to crypto in that (a) people
| are starting with the technology and trying to find problems
| and (b) there is a cult like atmosphere where non-believers
| are seen as being anti-progress and anti-technology.
| fnordpiglet wrote:
| Yeah I think a key is LLMs in business are not generally
| useful alone. They require classical computing techniques
| to really be powerful. Accurate computation is a generally
| well established field and you don't need an LLM to do
| optimization or math or even deductive logical reasoning.
| That's a waste of their power which is typically abstract
| semantic abductive "reasoning" and natural language
| processing. Overlaying this with constraints, structure,
| and augmenting with optimizers, solvers, etc, you get a
| form of computing that was impossible more than 5 years
| prior and is only practical in the last 9 months.
|
| On the crypto stuff yeah I get it - especially if you're
| not in the weeds of its use. A lot of people formed
| opinions from GPT3.5, Gemini, copilot, and other crappy
| experiences and haven't kept up with the state of the art.
| The rate of change in AI is breathtaking and I think hard
| to comprehend for most people. Also the recent mess of
| crypto and the fact grifters grift etc also hurts. But
| people who doubt -are- stuck in the past. That's not
| necessarily their fault and it might not even apply to
| their career or lives in the present and the flaws are
| enormous as you point out. But it's such a remarkably
| powerful new mode of compute that it in combination with
| all the other powerful modes of compute is changing
| everything and will continue too, especially if next
| generation models keep improving as they seem to be likely
| to.
| jeffreygoesto wrote:
| That text applies to basically every new technology.
| Point is that you can't predict it's usefulness in 20
| years from that.
|
| To me it still looks like a hammer made completely from
| rubber. You can practice to get some good hits, but it is
| pretty hard to get something reliable. And a beginner
| will basically just bounce it around. But it is sold as
| rescue for beginners.
| idf00 wrote:
| I didn't see anything in the article that indicated the
| authors believed that those who don't see use cases for
| LLMs are anti-progress or anti-technology. Is that comment
| related to the authors of this article, or just a general
| grievance you have unrelated to this article?
| TeMPOraL wrote:
| > _We use LLMs in dozens of different production applications
| for critical business flows. They allow for a lot of dynamism
| in our flows that aren't amenable to direct quantitative
| reasoning or structured workflows. Double digit percents of
| our growth in the last year are entirely due to them. The
| biggest challenge is tool chain, limits on inference
| capacity, and developer understanding of the abilities,
| limits, and techniques for using LLMs effectively._
|
| That sounds like corporate buzzword salad. It doesn't tell
| much as it stands, not without at least one specific example
| to ground all those relative statements.
| mloncode wrote:
| Hi, Hamel here. I'm one of the co-authors. I'm an
| independent consultant and not all clients allow me to talk
| about their work.
|
| However, I have two that do, which I've discussed in the
| article. These are two production use cases that I have
| supported (which again, are explicitly mentioned in the
| article):
|
| 1. https://www.honeycomb.io/blog/introducing-query-
| assistant
|
| 2. https://www.youtube.com/watch?v=B_DMMlDuJB0
|
| Other co-authors have worked on significant bodies of work:
|
| Bryan Bischoff lead the creation of Magic in Hex:
| https://www.latent.space/p/bryan-bischof
|
| Jason Liu created the most popular OSS libraries for
| structured data called instructor
| https://github.com/jxnl/instructor, and works with some of
| the leading companies in the space like Limitless and
| Raycast (https://jxnl.co/services/#current-and-past-
| clients)
|
| Eugene Yan works with LLMs extensively at Amazon and uses
| that to inform his writing: https://eugeneyan.com/writing/
| (However he isn't allowed to share specifics about Amazon)
|
| I believe you might find these worth looking at.
| mattmanser wrote:
| You've linked to a query generator for a custom
| programming language and a 1 hour video about LLM tools.
| The cynic in me feels like the former could probably be
| done by chatgpt off the shelf.
|
| But those do not seem to be real world business cases.
|
| Can you expand a bit more why you think they are? We
| don't have hours to spend reading, and you say you've
| been allowed to talk about them.
|
| So can you summarise the business benefits for us, which
| is what people are asking for, instead of linking to huge
| articles?
| 80hd wrote:
| Sounds like something you could do with an LLM
| idf00 wrote:
| They think they are real business use cases, because real
| businesses use them to solve their use cases. They know
| that chatgpt can't solve this off the shelf, because they
| tried that first and were forced to do more in order to
| solve their problem.
|
| There's a summary for ya! More details in the stuff that
| they linked if you want to learn. Technical skills do
| require a significant time investment to learn, and LLM
| usage is no different.
| mloncode wrote:
| > do not seem to be real world business cases
|
| The first one is a real world product that lives in
| production that is user facing for a paid product.
|
| The second video goes in depth about how a AI assistant
| was built for a real estate CRM company, also a paid
| product.
|
| I don't understand the assertion that it's not "real
| world" or not "business"
|
| Here are additional articles about these
|
| https://help.rechat.com/guides/lucy
|
| https://www.prnewswire.com/news-releases/honeycomb-
| launches-...
| phillipcarter wrote:
| > The cynic in me feels like the former could probably be
| done by chatgpt off the shelf.
|
| Hello! I'm the owner of the feature in question who
| experimented with chatgpt last year in the course of
| building the feature (and working with Hamel to improve
| it via fine-tuning later).
|
| Even today, it could not work with ChatGPT. To generate
| valid queries, you need to know which subset of a user's
| dataset schema is relevant to their query, which makes it
| equally a retrieval problem as it does a generation
| problem.
|
| Beyond that, though, the details of "what makes a good
| query" are quite tricky and subtle. Honeycomb as a
| querying tool is unique in the market because it lets you
| arbitrarily group and filter by any column/value in your
| schema without pre-indexing and without any cost w.r.t.
| cardinality. And so there are many cases where you can
| quite literally answer someone's question, but there are
| multitudes of ways you can be _even more helpful_ , often
| by introducing a grouping that they didn't directly ask
| for. For example, "count my errors" is just a COUNT where
| the error column exists, but if you group by something
| like the HTTP route, the name of the operation, etc. --
| or the name of a child operation and its calling HTTP
| route for requests -- you end up actually showing people
| where and how these errors come from. In my experience,
| the large majority of power users already do this
| themselves (it's how you use HNY effectively), and the
| large majority of new users who know little about the
| tool simply have no idea it's this flexible. Query
| Assistant helps them with that and they have a pretty
| good activation rate when they use it.
|
| Unfortunately, ChatGPT and even just good old fashioned
| RAG is often not up to the task. That's why fine-tuning
| is so important for this use case.
| fnordpiglet wrote:
| Thanks for the reply. Huge fan of honeycomb and the
| feature. Spent many years in observability and built a
| some of the large in use log platforms. Tracing is the
| way of the future and hope to see you guys eat that
| market. I did some executive tech strategy stuff at some
| megacorp on observability and it's really hard to unwedge
| metrics and logs but I've done my best when it was my
| focus. Good luck and thanks for all you're doing over
| there.
| obiefernandez wrote:
| The book I'm writing is almost finished and is based almost
| entirely on production use cases: https://leanpub.com/patterns-
| of-application-development-usin...
| cqqxo4zV46cp wrote:
| Or maybe they could choose to focus their attention on people
| that aren't needlessly aggressive and adversarial.
| mloncode wrote:
| Hi, Hamel here. I'm one of the co-authors. I'm an independent
| consultant and not all clients allow me to talk about their
| work.
|
| However, I have two that do, which I've discussed in the
| article. These are two production use cases that I have
| supported (which again, are explicitly mentioned in the
| article):
|
| 1. https://www.honeycomb.io/blog/introducing-query-assistant
|
| 2. https://www.youtube.com/watch?v=B_DMMlDuJB0
|
| Other co-authors have worked on significant bodies of work:
|
| Bryan Bischoff lead the creation of Magic in Hex:
| https://www.latent.space/p/bryan-bischof
|
| Jason Liu created the most popular OSS libraries for structured
| data called instructor https://github.com/jxnl/instructor, and
| works with some of the leading companies in the space like
| Limitless and Raycast (https://jxnl.co/services/#current-and-
| past-clients)
|
| Eugene Yan works with LLMs extensively at Amazon and uses that
| to inform his writing: https://eugeneyan.com/writing/ (However
| he isn't allowed to share specifics about Amazon)
|
| I believe you might find these worth looking at.
| anon373839 wrote:
| I know it's a snarky comment you responded to, but I'm glad
| you did. Those are great resources, as is your excellent
| article. Thanks for posting!
| hubraumhugo wrote:
| I think it comes down to relatively unexciting use cases that
| have a high business impact (process automation, RPA, data
| analysis), not fancy chatbots or generative art.
|
| For example, we focused on the boring and hard task of web data
| extraction.
|
| Traditional web scraping is labor-intensive, error-prone, and
| requires constant updates to handle website changes. It's
| repetitive and tedious, but couldn't be automated due to the
| high data diversity and many edge cases. This required a
| combination of rule-based tools, developers, and constant
| maintenance.
|
| We're now using LLMs to generate web scrapers and data
| transformation steps on the fly that adapt to website changes,
| automating the full process end-to-end.
| bbischof wrote:
| Hello, it's Bryan, an author on this piece.
|
| I'd you're interested in using one of the LLM-applications I
| have in prod, check out https://hex.tech/product/magic-ai/ It
| has a free limit every month to give it a try and see how you
| like it. If you have feedback after using it, we're always very
| interested to hear from users.
| threeseed wrote:
| RAGs do not prevent hallucinations nor does it guarantee that the
| quality of your output is contingent solely on the quality of
| your input. Using LLMs for legal use cases for example has shown
| it to be poor for anything other than initial research as it is
| accurate at best 65%:
|
| https://dho.stanford.edu/wp-content/uploads/Legal_RAG_Halluc...
|
| So would strongly disagree that _LLMs have become "good enough"
| for real-world applications "_ based on what was promised.
| mattyyeung wrote:
| You may be interested "Deterministic Quoting"[1]. This doesn't
| completely "solve" hallucinations, but I would argue that we do
| get "good enough" in several applications
|
| Disclosure: author on [1]
|
| [1] https://mattyyeung.github.io/deterministic-quoting
| threeseed wrote:
| Have seen this approach before.
|
| It's the yes we hallucinate but don't worry because we
| provide the sources for users to check.
|
| Even though everyone knows that users will never check unless
| the hallucination is egregious.
|
| It's such a disingenuous way of handling this.
| phillipcarter wrote:
| > So would strongly disagree that LLMs have become "good
| enough" for real-world applications" based on what was
| promised.
|
| I can't speak for "what was promised" by anyone, but LLMs have
| been good enough to live in production as a core feature in my
| product since early last year, and have only gotten better.
| sheepscreek wrote:
| I'm sure this has some decent insights but it's from almost 1
| year ago! A lot has changed in this space since then.
| bgrainger wrote:
| Are you sure? The article says "cite this as Yan et al. (May
| 2024)" and published-time in the metadata is 2024-05-12.
|
| Weird: I just refreshed the page and it now redirects to a
| different domain (than the originally-submitted URL) and has a
| date of June 8, 2023. It still cites articles and blog posts
| from 2024, though.
| jph00 wrote:
| Looks like they made a mistake in the article metadata - they
| definitely just released this article.
| jph00 wrote:
| OK I let them know, and they've fixed it now.
| sheepscreek wrote:
| Awesome - thanks. Makes much more sense now. Can't update
| my original comment but hopefully people will read this.
| mloncode wrote:
| This is Hamel, one of the authors of the article. We published
| the article with OReilly here:
|
| Part 1: https://www.oreilly.com/radar/what-we-learned-from-a-
| year-of... Part 2: https://www.oreilly.com/radar/what-we-learned-
| from-a-year-of...
|
| We were working on this webpage to collect the entire three part
| article in one place (the third part isn't published yet). We
| didn't expect anyone to notice the site! Either way, part 3
| should be out in a week or so.
| seventytwo wrote:
| Was wondering about the June 8th date on there :)
| blumomo wrote:
| > PUBLISHED
|
| > June 8, 2024
|
| Is this an article from the future?
| defrost wrote:
| Good catch.
|
| Best guess is that's the anticipated publishing date of the
| _full_ three parts on the official O 'Reilly site.
|
| See: https://news.ycombinator.com/item?id=40551413
| mercurialsolo wrote:
| As we go about moving LLM enabled products into production we
| definitely see a bunch of what is being spoken about resonate. We
| also see the below as areas which need to be expanded upon for
| developers building in the space to take products to production :
|
| I would love to see this article also expand to touch upon things
| like : - data management - (tooling, frameworks, open vs closed
| data management, labelling & annotations) - inference as a
| pipeline - frameworks for breaking down model inference into
| smaller tasks & combining outputs (do DAG's have a role to play
| here?) - prompts - areas like caching, management, versioning,
| evaluations - model observability - tokens, costs, latency,
| drift? - evals for multimodality - how do we tackle evals here
| which in turn can go into loops e.g. quality of audio, speech or
| visual outputs
| JKCalhoun wrote:
| > Note that in recent times, some doubt has been cast on if this
| technique is as powerful as believed. Additionally, there's
| significant debate as to exactly what is going on during
| inference when Chain-of-Thought is being used...
|
| I love this new era of computing we're in where rumors, second-
| guessing and something akin to voodoo have entered into working
| with LLMs.
| ezst wrote:
| That's the thing, it's a novel form of computing that's
| increasingly moving away from computer science. It deserves to
| be treated as a discipline of its own, with lots of words of
| caution and danger stickers slapped over it.
| skydhash wrote:
| It's text (word) manipulation based on probalistic rules
| derived from analyzing human-produced text. And everyone
| knows language is imperfect. That's why we have introduced
| logic and formalism so that we can reliably transmit
| knowledge.
|
| That's why LLMs are good at translating and spellchecking.
| We've been describing the same world and almost all texts
| respect grammar. That's the first things that surface. But
| you can extract the same rules in other way and create a
| program that does it without the waste of computing power.
|
| If we describe computing as solving problems, then it's not
| computing because if your solution was not part of the
| training data, you won't solve anything. If we describe
| computing as symbol manipulation, then it's not doing a good
| job because the rules changes with every model and they are
| probabilistic. No way to get a reliable answer. It's
| divination without the divine (no hint from an omniscient
| entity).
| amelius wrote:
| Yeah like psychology being a different field from physics
| even if it is running on atoms ultimately.
|
| Imagine if physics literature was filled with stuff about
| psychology and how that would drive physicists nuts. That's
| how I feel right now ;)
| pklee wrote:
| This is pure gold !! Thank you so much eugene and gang for doing
| this. For those of them which I have encountered, I can 100 %
| agree with them. This is fantastic !! So many good insights.
| jakubmazanec wrote:
| I'm not saying the content of the article is wrong, but what apps
| are people/companies writing articles like this actually
| building? I'm seriously unable to imagine any useful app. I only
| use GPT via API (as better Google for documentations, and its
| output is never usable without heavy editing). This week I tried
| to use "AI" in Notion: I needed to generate 84 check boxes for
| each day starting with specific date. I got 10 check boxes and
| line "here should go rest..." (or some variation of such lazy
| output). Completely useless.
| qeternity wrote:
| I think you're going about it backwards. You don't take a tool,
| and then try to figure out what to do with it. You take a
| problem, and then figure out which tool you can use to solve
| it.
| jakubmazanec wrote:
| But it seems to me that's what they're doing: "We have LLMs,
| what to do with them?" But anyway, I'm seriously just looking
| for an example of app that is build with stuff described in
| the article.
|
| Me personally, I only used LLM for one "serious" application:
| I used GPT-3.5Turbo for transforming unstructured text into
| JSON; it was basically just ad-hoc Node.js script that called
| API (prompt was few examples of input-output pairs), and then
| it did some checks (these checks usually failed only because
| GPT also corrected misspellings). It would take me weeks to
| do it manually, but with the help of GPT it was few hours
| (writing of the script + I made a lot of misspellings so the
| script stopped a lot). But I cannot imagine anything more
| complex.
| exhaze wrote:
| https://github.com/hrishioa/lumentis
|
| Since you seem to have not noticed my comment above, here's
| another example of a project that implements many of these
| techniques. Me and many others have used this to transcribe
| hour long videos into a well organized "docs site" that
| makes the content easy to read.
|
| Example: https://matadoc.vercel.app/
|
| This was completely auto-generated in a few minutes. The
| author of the library reviewed it and said that it's nearly
| 100% correct and people in the company where it was built
| rely on these docs.
|
| Tell me how long it would take you to write these docs. I'm
| really confused where your dismissive mentality is coming
| from in the face of what I think is overwhelming evidence
| to the contrary. I'm happy to provide example after example
| after example. I'm sorry, but you are utterly, completely
| wrong in your conclusions.
| jakubmazanec wrote:
| But that seems to belong to the category "text
| transformation" (e.g. translating, converting unstructed
| notes into structured data, etc.), which I acknowledge
| LLMs are good at; instead of category "I'll magically
| debug your SQL wish!".
| exhaze wrote:
| I believe we were discussing the former not the latter? I
| agree that for lots of problem solving tasks it can be
| hit or miss - in my experience, all the models are quite
| bad at writing decent frontend code when it comes to the
| rendered page looking the way you want it to.
|
| What you're describing is more about reasoning abilities
| - that's not really what the article was about or the
| problems the techniques are for. The techniques in
| article are more for stuff like Q&A, classification,
| summarization, etc.
| root_axis wrote:
| I've tried this type of thing quite a bit (generating
| documentation based on code I've written), and it's
| generally pretty bad. Even just generating a README for a
| single source file project produces bloviated fluff that
| I have to edit rigorously. I'd say it does about 40% of
| the job, which is obviously a technical marvel, but in a
| practical sense it's more novelty than utility.
| exhaze wrote:
| Please just go and try the lumentis library I mentioned -
| that is what was used to generate this. It works. For the
| library docs I showed, I literally just wrote a zsh
| script to concat all the code together into one file,
| each one wrapped with XML open/close tags, and fed that
| in. Just because you weren't able to do it doesn't mean
| it's a novelty.
| exhaze wrote:
| I've built many production applications using a lot of these
| techniques and others - it's made money either by increasing
| sales or decreasing operational costs.
|
| Here's a more dramatic example: https://www.grey-wing.com/
|
| This company provides deeply integrated LLM-powered software
| for operating freight ships.
|
| There are a lot of people who are doing this and achieving very
| good results.
|
| Sorry, if it's not working for you, it doesn't mean that it
| doesn't work.
| robbiep wrote:
| That's really interesting. Surely the crewing roster stuff is
| actually using linear algebra rather than AI though?
| hakanderyal wrote:
| If you didn't follow what has been happing in the LLM space, this
| document gives you everything you need to know about state of the
| art LLM usage & applications.
|
| Thanks a lot for this!
| gengstrand wrote:
| Interesting blog. It seems to be a compendium of advice for all
| kinds of folks ranging from end user to integration partner. For
| a slightly different take on how to use LLMs to build software,
| you might be interested in https://www.infoq.com/articles/llm-
| productivity-experiment/ which documents an experiment where the
| same prompt was given to various prominent LLMs asking to write
| two unit tests for an already existing code base. The results
| were collected, metrics were analyzed, then comparisons were
| made. No advice on how to write better prompts but some insight
| on how to work with and what you can expect from LLMs in order to
| improve developer productivity.
___________________________________________________________________
(page generated 2024-06-02 23:02 UTC)