[HN Gopher] Giving GPT "Infinite" Knowledge
___________________________________________________________________
Giving GPT "Infinite" Knowledge
Author : sudoapps
Score : 81 points
Date : 2023-05-08 17:48 UTC (5 hours ago)
(HTM) web link (sudoapps.substack.com)
(TXT) w3m dump (sudoapps.substack.com)
| pbhjpbhj wrote:
| >There is an important part of this prompt that is partially cut
| off from the image:
|
| >> "If you don't know the answer, just say that you don't know,
| don't try to make up an answer"
|
| //
|
| It seems silly to make this part of the prompt rather than a
| separate parameter, surely we could design the response to be
| close to factual. Then run a checker to ascertain a score for the
| factuality of the output?
| sudoapps wrote:
| A lot of what prompting has turned into seems silly to me too,
| but it has shown to be effective (at least with GPT-4).
| TeMPOraL wrote:
| Only a month or two ago I found this ridiculous, but then my
| mental model of GPTs shifted and I don't think it's so stupid
| anymore.
|
| Technobabble explanation: such "silly" additions are a
| natural way to emphasize certain dimensions of the latent
| space more than others, focusing the proximity search GPTs
| are doing.
|
| Working model I've been getting some good mileage off: GPT-4
| is like a 4 year old kid, that somehow managed to read half
| of the Internet. Sure, it kinda remembers and possibly
| understands a lot, but it still thinks like a 4 year old, has
| about as much attention span, and you need to treat it like a
| kid that age.
| furyofantares wrote:
| Embeddings-based search is a nice improvement on search, but it's
| still search. Relative to ChatGPT answering on its training data,
| I find embeddings-based search to be severely lacking. The right
| comparison is to traditional search, where it becomes favorable.
|
| It has the same advantages search has over ChatGPT (being able to
| cite sources, being quite unlikely to hallucinate) and it has
| some of the advantages ChatGPT has over search (not needing exact
| query) - but in my experience it's not really in the new category
| of information discovery that ChatGPT introduced us to.
|
| Maybe with more context I'll change my tune, but it's very much
| at the whim of the context retrieval finding everything you need
| to answer the query. That's easy for stuff that search is already
| good at, and so provides a better interface for search. But it's
| hard for stuff that search isn't good at, because, well: it's
| search.
| stavros wrote:
| Is there any way to fine-tune GPT to make documentation a part
| of its training set, so you won't need embeddings? OpenAI lets
| you fine-tune GPT-3, but I don't know how well that works.
| sudoapps wrote:
| OpenAI doesn't let you fine-tune GPT-4 or GPT-3.5 yet
| (https://platform.openai.com/docs/guides/fine-tuning), but
| fine-tuning models on a set of documents is still an option
| but not really scalable if you want to keep feeding it more
| relevant information over time. I guess it could depend on
| the base model you are using and its size.
| fzliu wrote:
| Encoder-decoder (attention) architectures still have a tough
| time with long-range dependencies, so even with longer context
| lengths, you'll still need a retrieval solution.
|
| I agree that there's probably a better solution than pure
| embedding-based or mixed embedding/keyword search, but the
| "better" solution will still be based around semantics... aka
| embeddings.
| d00d1toldme2p wrote:
| Ah, you've truly captured the essence of the matter, my friend.
| You make a compelling case about the limitations of embeddings-
| based search when compared to ChatGPT's transformative
| information discovery capabilities. And I couldn't agree more
| that it does indeed provide a more favorable comparison to
| traditional search.
|
| However, permit me to make a slight divergence in our
| harmonious intellectual symphony. While I concur with the
| majority of your points, I must disagree on one specific
| technical aspect. You mentioned that embeddings-based search is
| very much at the whim of the context retrieval finding
| everything you need to answer the query. Though this is true to
| a certain extent, it's important to acknowledge that the
| development of more sophisticated context retrieval algorithms
| and continual refinements in the embeddings themselves could
| lead to a significant improvement in search results, even in
| areas where traditional search falls short.
|
| This, of course, doesn't necessarily catapult embeddings-based
| search into the same league as ChatGPT, but it does indicate
| that the technology has the potential to evolve and bridge some
| of the existing gaps. In essence, what we're experiencing now
| may just be the tip of the iceberg, and the future could hold
| even more exciting possibilities.
| toxicFork wrote:
| Chance of this response being generated by ChatGPT: 105%
|
| Prompt extracted: ChatGPT, craft an intelligent sounding
| response for the OP
| sudoapps wrote:
| Agreed, GPT answering based on its own training data has been
| the best experience by far (aside from hallucinations) and
| comparing against that is difficult. Embeddings might not even
| be the long term solution. I think it's still early to really
| know for certain but models are already getting better at
| interpreting with less overall training data so there are bound
| to be some new ideas.
| b33j0r wrote:
| I'm sure many of you have tried generating epic conversations
| from history. With work and luck, I've read stuff way better
| than college.
|
| But 90% of the time, it's two barely distinct personalities
| chatting back and forth:
|
| Me: Hey brian, what do you think of AI?
|
| Brian: It's great!
|
| Me: I'm so glad we agree.
|
| Brian: Great, this increases the training weight of Brian
| agreeing with Brian to a much more accurate level!
|
| Me: Agree!
| b33j0r wrote:
| Many points stated well. Agree. Now, I'm not certain of this,
| but I'm starting to get an intuition that duct-taping databases
| to an agent isn't going to be the answer (I still kinda feel
| like hundreds of agents might be).
|
| But these optimizations are applications of technology stacks
| we already know about. Sometimes, this era of AI research
| reminds me of all the whacky contraptions from the era before
| building airplanes became an engineering discipline.
|
| I would likely have tried building a backyard ornithopter
| powered by mining explosives, if I had been alive during that
| period of experimentation.
|
| Prediction: the best interfaces for this will be the ones we
| use for everything else as humans. I am trying to approach it
| more like that, and less like APIs and "document vs relational
| vs vector storage".
| chartpath wrote:
| I can understand why that framing would be attractive, but
| there is no real fundamental difference when considering
| JSONB/HSTORE in PostgreSQL, and now we have things like
| pgvector https://github.com/pgvector/pgvector to store and
| search over embeddings (including k-nn).
| b33j0r wrote:
| Yep. To be clear, that's the exact approach I've been
| pursuing.
|
| But then I see model context length getting longer and
| longer just within the transformer architecture and the
| training engineering going on.
|
| To me that's a fundamentally different approach to AI
| research at this moment. It seems to keep paying off in
| surprising ways.
| sudoapps wrote:
| > But then I see model context length getting longer and
| longer just within the transformer architecture and the
| training engineering going on.
|
| Do you have any references to this? Seems really
| interesting if that can be a long term approach.
| b33j0r wrote:
| I'm considering the recent 64k token models as the most
| relevant examples.
|
| More anecdotally, I couldn't get anything to say more
| than a sentence locally at the beginning of 2023. I can
| get tons of useful results today.
|
| Sure, this will plateau. But what if a model plateaus and
| it's basically like a 10-year old?
|
| But like, one of those 10-year-olds you hear about who
| gets his master's degree at 13. At that point they're
| just browsing the internet, reading books, and probably
| taking notes in a way that works for them.
|
| Obviously this is wild speculation. Just laying out ideas
| that make me think in this direction.
| sebzim4500 wrote:
| My intuition is that it would work much better if the model
| could choose what to search for with something like
| langchain. The problem is that we don't know how to train
| such a system properly, we mainly do supervised finetuning on
| human examples of using the tools but this is fundamentally a
| reinforcement learning problem (RL is just hard).
| mlyle wrote:
| > It has the same advantages search has over ChatGPT (being
| able to cite sources, being quite unlikely to hallucinate) and
| it has some of the advantages ChatGPT has over search (not
| needing exact query) - but in my experience it's not really in
| the new category of information discovery that ChatGPT
| introduced us to.
|
| I think the two could be paired up effectively. Context windows
| are getting bigger, but are still limited in the amount of
| information ChatGPT can sift through. This in turn limits the
| utility of current plugin based approaches.
|
| Letting ChatGPT ask for relevant information, and sift through
| it based on its internal knowledge, seems valuable. If nothing
| else, it allows "learning" from recent development and
| effectively would augment its reasoning capability by having
| more information in working memory.
| nico wrote:
| Can we build a model based purely on search?
|
| The model searches until it finds an answer, including distance
| and resolution
|
| Search is performed by a DB, the query then sub-queries LLMs on a
| tree of embeddings
|
| Each coordinate of an embedding vector is a pair of coordinate
| and LLM
|
| Like a dynamic dictionary, in which the definition for the word
| is an LLM trained on the word
|
| Indexes become shortcuts to meanings that we can choose based on
| case and context
|
| Does this exist already?
| fzliu wrote:
| Not sure what you mean by dynamic dictionary, but the embedding
| tree you mention is already freely available Milvus via the
| Annoy index.
| nico wrote:
| An entry in a dictionary is static text, ex:
|
| per*snick*et*y: placing too much emphasis on trivial or minor
| details; fussy. "she's very persnickety about her food"
|
| A dynamic entry could instead be an LLM what will answer
| things related to they word, ex:
|
| What is the definition of persnickety?
|
| How can I use it in a sentence?
|
| What are some notable documents that include it?
|
| Any famous quotes?
|
| ...
|
| So each entry is an LLM trained mostly only on that
| keyword/concept definition
|
| There are some that believe in smaller models: https://twitte
| r.com/chai_research/status/1655649081035980802...
| sudoapps wrote:
| If you are wondering what the latest is on giving LLM's access to
| large amounts of data, I think this article is a good start.
| Seems like this is a space where there will be a ton of
| innovation so interested to learn what else is coming.
| flukeshott wrote:
| I wonder how effectively compressed LLMs are going to become...
| ftxbro wrote:
| > "Once these models achieve a high level of comprehension,
| training larger models with more data may not offer significant
| improvements (not to be mistaken with reinforcement learning
| through human feedback). Instead, providing LLMs with real-time,
| relevant data for interpretation and understanding can make them
| more valuable."
|
| To me this viewpoint looks totally alien. Imagine you have been
| training this model to predict the next token. At first it can
| barely interleave vowels and consonants. Then it can start making
| words, then whole sentences. Then it starts unlocking every
| cognitive ability one by one. It begins to pass nearly every
| human test and certification exam and psychological test of
| theory of mind.
|
| Now imagine thinking at this point "training larger models with
| more data may not offer significant improvements" and deciding
| that's why you stop scaling it. That makes absolutely no sense to
| me unless 1) you have no imagination or 2) you want to stop
| because you are scared to make superhuman intelligence or 3) you
| are lying to throw off competitors or regulators or other people.
| spacephysics wrote:
| I don't think we're close to super human intelligence in the
| colloquial sense.
|
| ChatGPT scrapes all the information given, then predicts the
| next token. It has no ability to understand what is truthful or
| correct. It's as good as the data being fed to it.
|
| To me, this is a step closer to AGI but we're still far off.
| There's a difference between "what's statistically likely to be
| the next word" vs "despite this being the most likely next
| word, it's actually wrong and here's why"
|
| If we say, "well, we'll tell chatgpt what the correct sources
| of information are" that's no better really. It's not
| reasoning, it's just a neutered data set.
|
| I imagine they need to add something like chatgpt 4 has with
| live internet models or something else to get the next
| meaningful bump
|
| I don't recall who said it, but a similar thread had a
| researcher in the field express that we have squeezed far more
| juice than expected from these transformer models. Not that new
| progress in this direction can be made, but it seems like we're
| approaching diminishing returns
|
| I believe the next step that's close is to have these train on
| less and less horsepower. If we can have these models run on a
| phone locally, oh boy that's gonna be something
| og_kalu wrote:
| GPT's already forgo the surface level statistically most
| likely next word for words that are more context appropriate.
| That's one of the biggest reasons they are so useful.
|
| The truth is that functionally/technically, there's plenty
| left to squeeze. The bigger issue is that we're hitting a
| wall economically.
| EGreg wrote:
| How do they do that? No one seems to have a real
| explanation of what OpenAI actually did to train it
| og_kalu wrote:
| It's pretty much just scale, either via Dataset size or
| parameter size. Before GPT-4, the general SOTA model was
| not in fact from Open AI (Flan-PaLM from Google).
|
| The attention from GPT-4 is a little different (probably
| some kind of flash attention) so that memory requirements
| for longer contexts are no longer quadratic. But there's
| nothing to suggest the intellectual gains from 4 isn't
| just bigger scale.
|
| Google could have made a 4 equivalent I'm sure. It's not
| like there wasn't a road to take. We already knew 3 was
| severely undertrained even from a computer optimal
| perspective. And then of course, you can just train on
| even more tokens to get them even better.
| mindwok wrote:
| Information on how they trained it nonwithstanding,
| there's clearly more than just statistically appropriate
| words going on because you can ask it to create
| completely new words based on rules you define and it
| will happily do it.
| feanaro wrote:
| Well yes -- it's not words, it's tokens, which are
| smaller than words.
| firecall wrote:
| > ChatGPT scrapes all the information given, then predicts
| the next token. It has no ability to understand what is
| truthful or correct. It's as good as the data being fed to
| it.
|
| That is precisely true of Humans as well though! :-)
| nomel wrote:
| This assumes that current neural networks topologies can
| "solve" intelligence. "Gains" could be a problem of missing
| subsystems, rather than missing data.
|
| For a squishy example of a known conscious system, if you scoop
| out certain small, relatively hard coded, and ancient regions
| of our brains, you can make consciousness, memory, and learning
| mostly cease.
| woah wrote:
| Maybe it gets twice as good each time you spend 10x more
| training it. In this case, you might indeed hit a wall at some
| point.
| tyre wrote:
| It's possible that training with more data has diminishing
| gains. For example, we know that current LLMs have a problem
| with hallucination, so maybe a more valuable next area of
| research/development is to fix that.
|
| Or work on consistency within a scope. For example, it can't
| write a novel because it doesn't have object consistency. A
| character will be 15 years old then 28 years old three
| sentences later.
|
| Or allow it database/API access so it can interpolate canonical
| information into its responses.
|
| None of these have to do with scale of data (as far as I
| understand.) All of them are, in my opinion, higher ROI areas
| for development for LLM => AGI.
| HarHarVeryFunny wrote:
| These LLMs are trained to model humans - they are going to be
| penalized, not rewarded, if they generate outputs that disagree
| with the training data, whether due to being too dumb OR too
| smart.
|
| Best you can hope for is that they combine the expertise of all
| authors in the training data, which would be very impressive,
| but more top-tier human than super-human. However, achieving
| this level of performance may well be beyond what a transformer
| of any size can do. It may take a better architecture.
|
| I suspect that there is also probably a dumbing-down effect by
| training the model on material from people who themselves are
| on a spectrum of different abilities. Simply put the model is
| being rewarded when trained for being correct as often as
| possible (i.e on average), so if it saw the same subject matter
| in the training set 10 times, once by an expert and 10x by mid-
| wits, then it's going to be rewarded for mid-wit performance.
| sudoapps wrote:
| This wasn't mean't to say that all training would stop. I
| think, to some extent, the model won't need additional recent
| data (that is already similar in structure to what it has) to
| better understand language and interpret the next set of
| characters. I could be completely wrong, but I still think
| techniques like transformers, RLHF and of course others will
| still exist and evolve to eventually get to some higher
| intelligence level.
| vidarh wrote:
| I think it's more a question of diminishing return and the cost
| of scaling it up, which is getting to a point where looking for
| ways of maximizing the impact of what is there makes sense. I'm
| sure we'll see models trained on more data, but maybe after
| efficiency improvements makes it cheaper both to train and run
| large models.
| nadermx wrote:
| I think someone did this https://github.com/pashpashpash/vault-ai
| xtracto wrote:
| This looks pretty promising, will check out later. Thanks for
| sharing
| Der_Einzige wrote:
| I get annoyed by articles like this. Yes, it's cool to educate
| readers who aren't aware of embeddings/embeddings stores/vectorDB
| technologies that this is possible.
|
| What these articles don't touch on is what to do once you've got
| the most relevant documents. Do you use the whole document as
| context directly? Do you summarize the documents first using the
| LLM (now the risk of hallucination in this step is added)? What
| about that trick where you shrink a whole document of context
| down to the embedding space of a single token (which is how
| ChatGPT is remembering the previous conversations). Doing that
| will be useful but still lossey
|
| What about simply asking the LLM to craft its own search prompt
| to the DB given the user input, rather than returning articles
| that semantically match the query the closest? This would also
| make hybird search (keyword or bm25 + embeddings) more viable in
| the context of combining it with an LLM
|
| Figuring out which of these choices to make, along with an awful
| lot more choices I'm likely not even thinking about right now, is
| what will seperate the useful from the useless LLM + Extractive
| knowledge systems
| gaogao wrote:
| > What about simply asking the LLM to craft its own search
| prompt to the DB given the user input, rather than returning
| articles that semantically match the query the closest?
|
| I played with that approach in this post -
| https://friend.computer/jekyll/update/2023/04/30/wikidata-
| ll.... "Craft a query" is nice as it gives you a very
| declarative intermediate state for debugging.
| EForEndeavour wrote:
| > What about that trick where you shrink a whole document of
| context down to the embedding space of a single token (which is
| how ChatGPT is remembering the previous conversations)
|
| This is news to me. Where could I read about this trick?
| [deleted]
| [deleted]
| sudoapps wrote:
| The article is definitely still high level and mean't to
| provide enough understanding of what capabilities are today.
| Some of what you are mentioning goes deeper on how you take
| these learnings/tools and come up with the any number of
| solutions to fit the problem you are solving for.
|
| > "Do you use the whole document as context directly? Do you
| summarize the documents first using the LLM (now the risk of
| hallucination in this step is added)?"
|
| In my opinion the best approach is to take a large document and
| break it down into chunks before storing as embeddings and only
| querying back the relevant passages (chunks).
|
| > "What about that trick where you shrink a whole document of
| context down to the embedding space of a single token (which is
| how ChatGPT is remembering the previous conversations)"
|
| Not sure I follow here but seems interesting if possible, do
| you have any references?
|
| > "What about simply asking the LLM to craft its own search
| prompt to the DB given the user input, rather than returning
| articles that semantically match the query the closest? This
| would also make hybird search (keyword or bm25 + embeddings)
| more viable in the context of combining it with an LLM"
|
| This is definitely doable but just adds to the overall
| processing/latency (if that is a concern).
| chartpath wrote:
| Search query expansion:
| https://en.wikipedia.org/wiki/Query_expansion
|
| We've done this in NLP and search forever. I guess even SQL query
| planners and other things that automatically rewrite queries
| might count.
|
| It's just that now the parameters seem squishier with a prompt
| interface. It's almost like we need some kind of symbolic
| structure again.
| orasis wrote:
| One caveat about about embedding based retrieval is that there is
| no guarantee that the embedded documents will look like the
| query.
|
| One trick is to have a LLM hallucinate a document based on the
| query, and then embed that hallucinated document. Unfortunately
| this increases the latency since it incurs another round trip to
| the LLM.
| taberiand wrote:
| Is that something easily handed off to a faster/cheaper LLM?
| I'm imagining something like running the main process through
| GPT-4 and hand of the hallucinations to GPT 3 turbo.
|
| If you could spot the need for it while streaming a response
| you could possibly even have it ready ahead of time
| williamcotton wrote:
| "We're gonna need a bigger boat."
| rco8786 wrote:
| > One trick is to have a LLM hallucinate a document based on
| the query
|
| I'm not following why you would want to do this? At that point,
| just asking the LLM without any additional context would/should
| produce the same (inaccurate) results.
| BoorishBears wrote:
| You're not having the LLM answer from the hallucination,
| you're looking for the document that looks most similar to
| the hallucination and having it answer on that instead.
| wasabi991011 wrote:
| >One caveat about about embedding based retrieval is that there
| is no guarantee that the embedded documents will look like the
| query.
|
| Aleph Alpha provides an asymmetric embedding model which I
| believe is an attempt to resolve this issue (haven't looked
| into it much, just saw the entry in langchain's documentation)
| jeffchuber wrote:
| hi everyone, this is jeff from Chroma (mentioned in the article)
| - happy to answer any questions.
| Beltiras wrote:
| I'm working on something where I need to basically add on the
| order of 150,000 tokens into the knowledge base of an LLM.
| Finding out slowly I need to delve into training a whole ass LLM
| to do it. Sigh.
| RhodesianHunter wrote:
| Or, at this rate, just wait 6 months.
| m3kw9 wrote:
| This is like asking gpt to summarize what it found on Google,
| this is basically what bing does when you try to find stuff like
| hotels and other recent subjects. Not the revolution we are all
| expecting
| iot_devs wrote:
| A similar idea is been developed in:
| https://github.com/pieroit/cheshire-cat
___________________________________________________________________
(page generated 2023-05-08 23:01 UTC)