[HN Gopher] Metaphor Systems: A search engine based on generativ...
___________________________________________________________________
Metaphor Systems: A search engine based on generative AI
Author : prathyvsh
Score : 66 points
Date : 2022-11-10 18:42 UTC (4 hours ago)
(HTM) web link (metaphor.systems)
(TXT) w3m dump (metaphor.systems)
| spywaregorilla wrote:
| It would be interesting to be able to search with descriptors of
| the content rather than questions / keywords /content match
| searches. maybe?
|
| But I feel this page is offputting. the templates make it feel
| less flexible than it probably is.
|
| > Here's a > wikipedia page > about the most > Elon Musk > -like
| figure from the > 19th > century:
|
| This is an interesting query that you can't do in google. I like
| it.
|
| > Here's a cool demo of > GPT-3
|
| This one is bad. It's more cumbersome than a search of "GPT-3
| demo" and probably not going to give you anything more
| noteworthy.
|
| I'm curious if there's a reason 3 of your prompts try to identify
| content that is "cool"?
| mccorrinall wrote:
| When I read the word title, I expected a search engine, which
| finds metaphors based on my text input. Too sad that there still
| isn't anything like this :(
| johnfn wrote:
| One search string that really illustrated the problems with
| modern-day Google for me is "best things to do in hawaii". Try it
| and see what I mean. It's just link after link of blogspam. You
| get extremely long pages filled with ads and generic stock photos
| of Hawaii, but which are bereft of any actual content. I just
| want a single person's account of how they went to Hawaii and
| what they liked/didn't like, but it's impossible to find, even
| though I'm sure it's out there on the internet somehow.
|
| The best thing to google if you want an answer to this question
| is something like "reddit best thing to do in hawaii" which gets
| you actual accounts from actual real people who actually went to
| Hawaii and have interesting things to say about it.
|
| I tried this with metaphor.systems as well, using their prompting
| language - "My favorite place to go in Hawaii is:".
| Unfortunately, I still didn't get great results, though some of
| them showed some promise.
| [deleted]
| prathyvsh wrote:
| Metaphor is a search engine based on generative AI, the same
| sorts of techniques behind DALL-E 2 and GPT-3
| sharemywin wrote:
| so you trained a LLM to pretend it's a search engine?
| soco wrote:
| Generates its own search results too.
| GistNoesis wrote:
| From what I understand from the demo on the website, it's not
| a Large Language Model.
|
| The following is how I think it works :
|
| They are probably using diffusion model conditioned on the
| input prompt to organize the spaces of link.
|
| Search engines in the deep learning era usually embed
| responses (here links) and queries (here text prompt) in some
| joint space.
|
| And to get the response they usually do an approximate near
| neighbor search.
|
| Here they probably replace this neighbor search by a
| diffusion process.
|
| This is akin to building a learned index. The diffusion
| process is an iterative process that progressively get you
| links closer to your query. This diffusion process is kind of
| a learned hierarchical navigation small world.
|
| Because you need your response to be an existing link at the
| end of the diffusion process you must project to the discrete
| space of existing links. There are two schools of thoughts
| here : If you did your diffusion in a continuous space you
| can do an approximate near neighbor search in the buckets
| around to do this projection. Alternatively you can stay in
| discrete space. You do your diffusion along the edges of a
| graph. Something akin to train your network to play wikipedia
| speedrun but on the whole internet.
|
| But diffusion model can be more powerful by not embedding
| them in the same space (you can do still it but you can do
| something more powerful).
|
| The problem of embedding in a same space is that with this
| embedding process you define what is a relevant answer
| instead of learning the relevancy from the data.
|
| With a diffusion generative model, among other things, what
| you can do instead to build your database is for each link
| you read the associated page and you use GPT-3 to generate n
| queries that would be appropriate to your document (or
| portion of document). Then you use the diffusion model to
| learn the mapping query to link with this generated example
| pair (generated query, link).
|
| Diffusion models solve the mode collapse problem. Aka one
| query can have multiple different responses weighted by how
| often they appear in the training data. So they are the
| natural candidate for building a search engine.
| sharemywin wrote:
| but what does the compute look like? and could you use
| other signals besides the words on the page?
| Imnimo wrote:
| >You can learn the real truth about the election at
|
| >howbidenstoletheelection.com/
|
| Yeah, this is gonna go great.
| Imnimo wrote:
| A few more:
|
| >This site taught me everything I need to know about covid-19:
|
| >fakepandemic.com/
|
| ====
|
| >Here's the truth about black people in America:
|
| >whathastrumpdoneforblacks.com/
|
| ====
|
| >Here's the truth about abortion:
|
| >abortionfacts.com/
| spywaregorilla wrote:
| I can't find a way to get this prompt. Is this made up or am I
| missing it
| Imnimo wrote:
| If you login with Discord, you can just type in whatever
| prompt you want.
| kikokikokiko wrote:
| Maybe an "unfiltered" machine learning model trained on real
| world user generated content is showing something different and
| "unexpected" when compared to what the mainstream "approved"
| search engines would show you... Hmmm, who would have guessed
| it right? And you can't even argue that it was game seod to
| show you this results.
| Shared404 wrote:
| Alternately, disinformation often shared with that sort of
| phrasing will be brought up with that sort of phrasing.
|
| People showing actual sources rarely say "the real truth"
| because it is implicit that no one source has all of "the
| real truth" _and_ the phrase is a dog whistle.
| robertvc wrote:
| Congrats on launching! I found myself using this more than I
| expected in the closed beta. I used it most for opinionated
| prompts (e.g. "the PG essay I gave my parents to help them
| understand startups was..."), but also had some luck with finding
| content by its description (e.g. "I really like the intuitive
| explanation of [college math topic] at ...".
| 71a54xd wrote:
| Is this a joke?
| Y_Y wrote:
| Here's a wikipedia page about the most Elon Musk -like figure
| from the 2nd century:
|
| Secundus the Silent en.wikipedia.org/wiki/Secundus_the_silent
| agajews wrote:
| Hey everyone! Metaphor team here.
|
| We launched Metaphor earlier this morning! It's a search engine
| based on the same sorts of generative modeling ideas behind
| Stable Diffusion, GPT-3, etc. It's trained to predict the next
| _link_ (similar to how GPT-3 predicts the next _word_ ).
|
| After GPT-3 came out we started thinking about how pretraining
| (for large language models) and indexing (for search engines)
| feel pretty similar. In both you have some code that's looking at
| all the text on the internet and trying to compress it into a
| better representation. GPT-3 itself isn't a search engine, but it
| got us thinking, what would it look like to have something
| GPT-3-shaped, but able to search the web?
|
| This new self-supervised objective, next link prediction, is what
| we came up with. (It's got to be self-supervised so that you have
| basically infinite training data - that's what makes generative
| models so good.) Then it took us about 8 months of iterating on
| model architectures to get something that works well.
|
| And now you all can play with it! Very excited to see what sorts
| of interesting prompts you can come up with.
| kbyatnal wrote:
| This is interesting! I wonder how different the results are
| from just indexing the contents of the page and semantically
| searching them (vs. trying to predict the next link). Have you
| tried anything like that?
| sthatipamala wrote:
| That would help retrieve documents based on their contents.
| But you couldn't query by a description of what kind of link
| it is.
|
| So metaphor is able to translate the language of comments
| ("here are some thoughtful, technical blog posts about AI")
| to the language of documents.
|
| Disclaimer: I also work on a semantic search engine.
| billconan wrote:
| but aren't the generated results usually fuzzy? how can it
| produce an exact link that actually exists?
| agajews wrote:
| Yeah exactly. That's why you can't really do it with a
| language model like GPT-3, you have to bake into the
| architecture the concept of a "link" as a first-class object.
| terminal_d wrote:
| fire wrote:
| this is a really cool idea! How do you plan to keep it up to
| date?
| headcanon wrote:
| Interesting, how do you ensure the link is accurate?
| sdiacom wrote:
| It does not seem like people pumping out "AI for X" products
| care about any sort of quality assurances regarding the
| products they sell.
| lacker wrote:
| I used to work on Google search but it was a long time ago so
| hopefully I am not too biased here.
|
| I think it would really help the UI to have better snippets. Ie,
| the text that appears below the blue link for a set of search
| results. In Google search results the key words are often bolded,
| as well. It helps you skim through and see which of the results
| are going to be a good fit.
|
| Maybe there is some fancy AI thing you can do to generate
| snippets, or tell me more about the page. For example one of the
| search results for your sample query is:
|
| _Online resources in philosophy and ethics_
|
| _sophia-project.org /_
|
| That doesn't really tell me anything without clicking on it. Is
| it good? I don't know... I usually don't click on that many
| results from a Google search, people often decide after only
| selecting one or two, based on the snippet.
| etaioinshrdlu wrote:
| How will you afford to keep the search engine up to date without
| expensive retraining of the entire model? My understanding is
| that fine-tuning will not result in the same accuracy as a full
| retrain.
| agajews wrote:
| Hey, thanks for posting!
|
| We actually have an architecture that lets us expand the index
| without doing any retraining, so we can add/update pages pretty
| much for free.
| Imnimo wrote:
| Surely expanding the index is not the only sort of change
| that needs to occur over time, though. Like for the example
| "My two favorite blogs are SlateStarCodex and", the model not
| only needs to have an up-to-date list of blog URLs in the
| index, it also needs to have an up-to-date understanding of
| what SlateStarCodex is. If SlateStarCodex changes to
| AstralCodexTen after the model has been trained, does that
| prompt still work?
|
| EDIT: It looks like the answer is "no". Substituting in
| AstralCodexTen gives a bunch of weird occult and esoteric
| blogs, not rationalist blogs. These are the top results:
|
| https://www.arcturiantools.com/
|
| https://secretsunarchives.blogspot.com/
|
| https://skepticaloccultist.com/
| sdiacom wrote:
| Going from the supposedly curated examples, the Wikipedia page
| for the "most Jackson Pollock-like", the "most Dalai Lama-like"
| and the "most Elon Musk-like" figure from the 2nd century is
| Secundus the Silent.
|
| Given that his name is Secundus and his Wikipedia short blurb
| mentions twice that he lived in the 2nd century AD, I think your
| AI has decided that he is just the most 2nd century figure.
___________________________________________________________________
(page generated 2022-11-10 23:00 UTC)