hngopher.com

       [HN Gopher] Find anything fast with Google's vector search techn...
       ___________________________________________________________________
        
       Find anything fast with Google's vector search technology
        
       Author : sshroot
       Score  : 198 points
       Date   : 2021-12-14 18:07 UTC (4 hours ago)
        
 (HTM) web link (cloud.google.com)
 (TXT) w3m dump (cloud.google.com)
        
       | ahurmazda wrote:
       | For a similar ANN/vector search capabilities, https://vespa.ai/
       | is a great open-source solution. Elasticsearch may offer some
       | form of ANN too but need to double check
        
         | sanxiyn wrote:
         | I don't think Elasticsearch has one yet, but OpenSearch does:
         | https://opensearch.org/docs/latest/search-plugins/knn/index/
        
           | m_ke wrote:
           | Lucene 9.0 just shipped with hnsw support, should make it
           | into ES at some point
           | (https://twitter.com/msokolov/status/1468395332531003393)
           | 
           | EDIT: ES integration PR:
           | https://github.com/elastic/elasticsearch/issues/78473
        
           | ahurmazda wrote:
           | Ah! Great to know. ANN searches are becoming table stakes at
           | this point. Hopefully, we will see more and more platforms
           | adding it to their repertoire.
        
       | ShamelessC wrote:
       | This gh repo makes it pretty easy to create similar tech by first
       | embedding any images you have using the released "CLIP" model
       | from Open AI and then creating a Faiss index over these embeds
       | for quick retrieval/decode. You can then do text->image, and
       | image->image semantic search.
       | 
       | https://github.com/rom1504/clip-retrieval
        
       | thirdtrigger wrote:
       | Interesting - we are working on an open source vector search
       | engine called Weaviate and did the same for the complete
       | Wikipedia and Wikidata.
       | 
       | [1] Docs:
       | https://www.semi.technology/developers/weaviate/current/
       | 
       | [2] Github: https://github.com/semi-technologies/weaviate
       | 
       | [3] Wikipedia demo dataset: https://github.com/semi-
       | technologies/semantic-search-through...
       | 
       | [4] Wikidata dataset: https://github.com/semi-
       | technologies/biggraph-wikidata-searc...
       | 
       | Last week there was also a feature on Techcrunch about vector
       | search and Weaviate: https://techcrunch.com/2021/12/11/2246180/
        
         | mravl wrote:
         | Real eyeopener. this will change the search industry completely
        
           | detaro wrote:
           | why?
        
             | thirdtrigger wrote:
             | That's a fair question - but I'm going to assume the open-
             | source nature is being meant with this.
        
         | CShorten wrote:
         | I've made some videos on Weaviate as well (Henry AI Labs) if
         | interested:
         | 
         | [1] Wikipedia Vector Search Demo with Weaviate:
         | https://www.youtube.com/watch?v=IGB8vjCuay0
         | 
         | [2] Vector Search through Wikidata with Weaviate:
         | https://www.youtube.com/watch?v=T4zlvknSbGc
         | 
         | [3] Demonstrations of Deep Learning:
         | https://www.youtube.com/watch?v=5jbneytoKi0
         | 
         | [4] Weaviate's GraphQL API for Neurosymbolic Search:
         | https://www.youtube.com/watch?v=K_2X48Tln9U
         | 
         | [5] Introducing the Weaviate Vector Search Engine:
         | https://www.youtube.com/watch?v=AS_2U_INpKk
        
           | thirdtrigger wrote:
           | These are all great!
           | 
           | There is also this video about modern search engines and
           | Weaviate on the AI Coffee Break YT channel:
           | https://www.youtube.com/watch?v=YkK5IKgxp-c
        
       | gk1 wrote:
       | It's great to see more and more talk of vector search and vector
       | databases. We've been promoting this technology for over a year
       | now and have several intro articles for anyone looking to learn
       | more[1], and a generous free tier on our vector search service[2]
       | for anyone looking to give vector search a shot.
       | 
       | [1] https://www.pinecone.io/learn/
       | 
       | [2] https://app.pinecone.io/
       | 
       | We are also actively researching the space, and just recently
       | published a paper on improving Google's ScaNN:
       | https://arxiv.org/abs/2112.02179
        
         | wswope wrote:
         | That reference/learning page is a great resource!
         | 
         | As for Pinecone itself, what are the main selling points as you
         | see them for a simple application (e.g. comparing trigram-
         | vectorized sets of strings) when compared to a home-rolled
         | solution using postgres with array types? Better performance,
         | ease of indexing, etc.?
        
           | gk1 wrote:
           | I pinged someone more technical from our team to chime in.
           | 
           | In the meantime I can say moving to the dense vector + ANN
           | search combo turns regular searches into semantic searches,
           | which means more relevant results.
           | 
           | If that's the case for you, then you can use Pinecone to go
           | further and make those results _fast_ ( <100ms), _fresh_
           | (CRUD + live index updates), and _filtered_ (apply single-
           | stage metadata- filtering). All on a fully managed system
           | that you can scale up /down with one API call.
        
         | indeed30 wrote:
         | Does Pinecone have any position on the status of document
         | embeddings and whether they would be considered PII? One of the
         | challenges of using a fully managed service is the headache of
         | adding yet another data subprocessor and all of the legal and
         | compliance questions that raises.
        
         | dvaun wrote:
         | I've been toying with making a deckbuilder for Magic: The
         | Gathering and could see this being potentially useful for
         | finding fun card combinations. Thanks!
        
           | kruptos wrote:
           | I love this idea. I would pay for that service!
        
           | gk1 wrote:
           | That would be a fun use case for us to promote. Let me know
           | when it's ready! The free plan supports as many as 1 million
           | items, more than enough for the all MTG cards in existence.
           | Plus you can add and filter by metadata, like card type and
           | properties.
        
             | dvaun wrote:
             | > Plus you can add and filter by metadata, like card type
             | and properties.
             | 
             | I read through your docs and figure that will be part of
             | the approach.
             | 
             | An idea I had was to find similar, or "next best", cards
             | for replacement in popular decks or to achieve similar
             | effects in order to bring down the cost of EDH, Modern,
             | etc. formats. I'm just getting back into the hobby again,
             | so having a tool like this would make my wife and wallet
             | happy :)
        
               | 16mb wrote:
               | I've resorted to playing modern with high quality fakes.
               | Otherwise wouldn't have the budget. Checkout bootlegmtg
               | on reddit
        
           | thirdtrigger wrote:
           | We are actually discussing this on the Weaviate Slack :-) htt
           | ps://weaviate.slack.com/archives/D02JM9D3HND/p16347312830...
        
       | amelius wrote:
       | Does anyone know of a good benchmark suite for search technology?
       | 
       | (And how well does the technique of the article work wrt it?)
        
       | CoolGuySteve wrote:
       | Is this more or less a k-d tree as a service? Where any distance
       | function can be used to index the data?
       | 
       | Or is it something different?
        
         | hamilyon2 wrote:
         | I thought k-d trees were useless in high-dimentional case. So,
         | it must be something else.
        
           | contravariant wrote:
           | I'd say they're about as useful as euclidean distance is.
        
         | ahurmazda wrote:
         | More or less but as always the devil is in the detail. Here is
         | a paper[1] that summarizes issues with naive approaches.
         | Incidentally. the proposed solution (Hierarchical NSW) in this
         | paper performed fairly well in the industry benchmarks
         | 
         | [1] https://arxiv.org/ftp/arxiv/papers/1603/1603.09320.pdf
        
         | monkeybutton wrote:
         | A k-d tree gives you exact answers to nearest neighbour
         | queries.
        
           | srean wrote:
           | A k-d tree is a data structure. Whether you use that for
           | exact nearest neighbor query or approximate is up to the
           | algorithm used. K-d trees work well for a handful of
           | dimension beyond that it becomes quite expensive.
        
       | freediver wrote:
       | I built multiple systems using vector search, one of them demoed
       | in a search engine for non-commercial content at
       | http://teclis.com
       | 
       | Running vector search (also sometimes referred to as semantic
       | search, or a part of semantic search stack) is a trivial matter
       | with open-source libraries like Faiss
       | https://github.com/facebookresearch/faiss
       | 
       | It takes 5 minutes to set up. You can search billion vectors on
       | common hardware. For low-latency (up to couple of hundred
       | milliseconds) use cases, it is highly unlikely that any cloud
       | solution like this would be a better choice than something
       | deployed on premise because of the network overhead.
       | 
       | (worth noting is that there are about two dozen vector search
       | libraries, all benchmarked at http://ann-benchmarks.com/ and most
       | of them open-source)
       | 
       | A much more interesting (and harder) problem is creating good
       | vectors to begin with. This refers to the process of converting a
       | text or an image to a multidimensional vector, usually done by a
       | machine learning model such as BERT (for text) or ImageNet (for
       | images).
       | 
       | Try entering a query like 'gpt3' or '2019' into the news search
       | demo linked in the Google's PR:
       | 
       | https://matchit.magellanic-clouds.com/
       | 
       | The results are nonsensical. Not because the vector search didn't
       | do its job well, but because generated vectors were suboptimal to
       | begin with. Having good vectors is 99% of the semantic search
       | problem.
       | 
       | A nice demo of what semantic search can do is Google's Talk to
       | Books https://books.google.com/talktobooks/
       | 
       | This area of research s fascinating. For those who want to play
       | with this more, an interesting end-to-end (including both vector
       | generation and search) open-source solution is Haystack
       | https://github.com/deepset-ai/haystack
        
         | noud wrote:
         | I just made a couple of searches with teclis. I have to say,
         | it's not bad. It's clearly not complete and I get several empty
         | searches. But the content of the results are of higher quality
         | than what I get with Google or DDG. Nice work!
        
           | freediver wrote:
           | Thanks. The index is tiny and it is just a proof of concept
           | of what a single person can do with technologies available
           | nowadays. I felt it is better for it to return zero results
           | than bad results.
        
             | gk1 wrote:
             | > The index is tiny
             | 
             | What was the largest index you've had on Faiss? That seems
             | to affect whether people think of it as more than adequate
             | or terribly inadequate.
        
               | freediver wrote:
               | This demo is only about million vectors. The largest I
               | had in Faiss was embeddings of the entire Wikipedia
               | (scale in the neighborhood of ~30 million vectors). I
               | know people running few billion vectors in Faiss.
        
               | leobg wrote:
               | So one vector per article? Doesn't this skew results? A
               | short article with 0.9 relevance score would rank higher
               | than a long article containing one paragraph with 1.0
               | relevance. Am I mistaken?
        
               | leobg wrote:
               | Also, BERT on cheap hardware? I thought that without a
               | GPU, vectorizing millions of snippets or doing sub-second
               | queries was basically out of the question.
        
             | petra wrote:
             | It's a good experience, for sure better than Google.
             | 
             | But I get 1/5 - 1/10 hit ratio(successful/empty searches).
             | That's not habit forming, memory forming for me.
             | 
             | Is there a core use case where I would get a good hit ratio
             | ?
        
               | freediver wrote:
               | As the site says this demo is by no means meant as a
               | replacement for Google, but rather to complement it. I
               | would say Teclis is good for content discovery and
               | learning new things outside the typical search engine
               | filter bubble. A few examples of good queries are listed
               | on the site.
               | 
               | A similar concept was shown here recently:
               | https://search.marginalia.nu
        
         | mrg3_2013 wrote:
         | Thanks for the reference to haystack. I didn't know it existed!
         | I was looking into huggingface that seems to allow to build
         | your own language model and train (still learning - but thats
         | what I've learnt so far). I don't know how expensive these get
         | (for example, if you have 100K lines?). Any thoughts on how
         | this compares to HuggingFace and any anecdotes on time it would
         | take to custom train ?
        
         | hiddencost wrote:
         | I recommend readers take parent post with a grain of salt.
         | 
         | (1) Google's offering returns with-in <5ms, in my experience.
         | (2) the demo is for paragraphs, not short text. You're putting
         | mismatched data into the input, of course it's not going to
         | work. Try a paragraph as suggested.
        
           | freediver wrote:
           | Hmm.. There is no web service that returns response in <5ms,
           | unless you are sitting at the very terminal of the hardware
           | producing the output.
           | 
           | The demo featured in this PR takes about 800-1000ms total to
           | produce search results. How much of that is the actual API is
           | not known. Typically an https request to an API in the cloud
           | will cost you at least 50ms of network latency, more likely
           | 100ms-200ms. If you are running vector search on premise you
           | will obviously not have this overhead.
           | 
           | Text embeddings typically work for short text as well as
           | paragraphs (paragraph embeddings are usually mean/max of word
           | embeddings anyway) simply because most commercial use cases
           | demand handling of short text input (because nobody is
           | inputting a paragraph into a typical search box; what use is
           | a news search if you can not type a single word like 'Biden'
           | or 'gpt3' into it).
        
             | hiddencost wrote:
             | It's a cloud offering, so the machines are located near
             | each other. Using it with other cloud services is a fair
             | comparison to running it in the same box on-prem.
             | 
             | The offering is similarity search, not a search engine.
             | They offer image to image as another comparison point.
        
               | freediver wrote:
               | With the caveat of having to use GCP to host your server
               | too, I can agree with you (although 5ms still sounds
               | incredibly low, how many vectors was that?).
               | 
               | I was obviously talking about a general use case where a
               | user considers using an API like this vs running Faiss,
               | and their server can be anywhere (a use case that is more
               | common to me personally).
        
         | leobg wrote:
         | So how are you creating the embeddings for your search engine?
         | GloVe? Sentence BERT? Are you training your own models? Are you
         | employing any kind of normalization? There are so many
         | variables to optimize on many levels. Which is, of course, what
         | makes this whole area super exciting.
        
         | 1vuio0pswjnm7 wrote:
         | Really like the idea of teclis, i.e., a non-commercial search
         | engine. Is it correct that teclis is HTTP-only (via port 1333)
         | and TLS is not an option. (NB. I am not suggesting there is
         | anything wrong with HTTP. I am simply curious if TLS is
         | available.)
        
           | freediver wrote:
           | There is nothing wrong with using HTTP if the data
           | transferred is not sensitive like in this case for demo
           | purposes (and if anything, it is also faster for the user).
        
         | dontreact wrote:
         | There are two huge things your 5 minute setup is missing which
         | are very hard techinically to tackle
         | 
         | 1. Incrementally updating the search space. Not that easy to
         | do, and becomes more important to not just do the dumb thing of
         | retraining the entire index on every update for larger
         | datasets.
         | 
         | 2. Combining vector search and some database-like search in an
         | efficient manner. I don't know if this Google post really
         | solves that problem or if they just do the vector lookup
         | followed by a parallelized linear scan, but this is still an
         | open research/unsolved problem.
        
           | freediver wrote:
           | Correct, that would take more than 5 minutes, although still
           | possible to do with Faiss (and not that hard relatively
           | speaking - in the Teclis demo, I indeed did your second point
           | - combine results with a keyword search engine and there are
           | many simple solutions you can use out there like Meilisearch,
           | Sonic etc.e). If you were to try using an external API for
           | vector search, you would still need to build keyword based
           | search separately (and then combining/ranking logic) so then
           | you may be better off just building the entire stack anyway.
           | 
           | Anyway, for me, the number one priority was latency and it is
           | hard to beat on-premise search for that.
           | 
           | Even then, a vector search API is just one component you will
           | need in your stack. You need to pick the right model, create
           | vectors (GPU intensive), then possibly combine search results
           | with keyword based search to improve accuracy etc. I am still
           | waiting to see an end-to-end API doing all this.
        
             | dontreact wrote:
             | Interesting. Did you also tackle the incremental update
             | problem with FAISS?
        
               | freediver wrote:
               | No, I didn't have a need for it in this demo (but is
               | certainly possible with Faiss).
        
           | gk1 wrote:
           | Exactly right. Things like data freshness (live index
           | updates), CRUD operations, metadata filtering, and horizontal
           | scaling are all "extras" that don't come with Faiss. Hence
           | the need for solutions like Google Matching Engine and
           | Pinecone.io.
           | 
           | And even if you do just want ANN and nothing else, some
           | people just want to make API calls to a live service and not
           | worry about anything else.
        
           | moab wrote:
           | Can you expand more or provide a concrete example for the
           | second point? What kind of database-like searches are you
           | thinking about for spatial data? Things like range-queries
           | can already be (approximately) done. Or are you thinking
           | about relational style queries on data associated with each
           | point?
        
             | dontreact wrote:
             | Yes exactly, relational style queries with each data point.
             | Maybe you have some metadata about your images and maybe
             | you need to join against another table to properly query
             | them. But at the same time you want to only grab the first
             | k nearest neighbors according to vector similarity.
        
               | gk1 wrote:
               | Pinecone does this, at least if I'm understanding your
               | use case right: https://www.pinecone.io/docs/metadata-
               | filtering/
               | 
               | And you're right, it wasn't easy to build.
        
           | etiennedi wrote:
           | Spot on! Both of those were motivating factors when building
           | Weaviate (Open Source Vector Search Engine). We really wanted
           | it to feel like a full database or search engine. You should
           | be able to do anything you would do with Elasticsearch, etc.
           | There should be no waiting time between creating an object
           | and searching. Incremental Updates and Deletes supported,
           | etc.
           | 
           | On your second point about efficient filtering, check out
           | this article I wrote outlining how filtered vector search
           | works in Weaviate: https://towardsdatascience.com/effects-of-
           | filtered-hnsw-sear...
           | 
           | For even more details on filtering, check the documentation: 
           | https://www.semi.technology/developers/weaviate/current/arch.
           | ..
        
       | monkeybutton wrote:
       | If you are interested in how ScaNN compares to other
       | approximation algorithms, there are some benchmarks here:
       | http://ann-benchmarks.com/
        
       | 323 wrote:
       | People say google search is terrible these days, but I find the
       | opposite.
       | 
       | I can vaguely describe in a sentence the gist of an article I've
       | read, or an image, and the proper result will usually be in the
       | first page.
       | 
       | Of course, it doesn't always work, sometimes there are "hash
       | collisions" so to speak, but I don't think the old algorithm
       | would have been more successfully either, since if I knew the
       | exact keywords to use, I wouldn't need to start with a vague
       | description in the first place.
        
         | [deleted]
        
         | tux3 wrote:
         | I'd like to join the chorus disappointed in Google search
         | results.
         | 
         | It seems to do more correction for you, which is great if
         | you're searching for common popular things. But any uncommon or
         | precise query will often be misunderstood as something else.
         | 
         | Plenty of times, no matter how I reword my sentence or what
         | sort of analogies I try to give it, I've had Google fail to
         | give me something that I know exists and that I have to find
         | some other way.
        
           | tehjoker wrote:
           | seems like it would be simple to reintroduce advanced search
           | but these guys are monomaniacal
        
         | Jemaclus wrote:
         | I've literally gone to Google and typed something very similar
         | to "That guy in that thing with the dog" and the correct answer
         | shows up as the first result. It's quite brilliant and magical
         | how they do that.
         | 
         | But sometimes it's a total miss when I want something very
         | specific, and it just shows me other things I didn't ask for.
        
         | authed wrote:
         | I have more trouble finding exactly what I'm looking for using
         | Google, for me it started going down hill when they removed the
         | Plus operator (and no quotes don't work the same).
         | 
         | Also, Yandex is much better for reverse image search (similar
         | images).
        
         | est31 wrote:
         | Yeah I remember that in the past, maybe so 5+ years ago, I used
         | to phrase things in a certain way to please their algorithm.
         | This is not needed any more. Sometimes it doesn't grasp
         | concepts and returns false results. But this has become rarer
         | as well.
        
         | dqv wrote:
         | >I can vaguely describe in a sentence the gist of an article
         | I've read, or an image, and the proper result will usually be
         | in the first page.
         | 
         | For the specific context of "I can find something I've already
         | found", yes, it's useful. I just wish there was a way to change
         | that context to "discovery mode" where it uses a different
         | algorithm that is oriented toward finding new information. I
         | want to find sites in the spirit of those old-fashioned sites
         | that are minimally styled with dense information. And not just
         | Wikipedia or a few "trusted" sources like it used to be in
         | earlier times, but a more well-rounded result set.
        
           | idealmedtech wrote:
           | > sites in the spirit of those old-fashioned sites
           | 
           | I think the problem is that such sites are very difficult to
           | find algorithmically, especially when it comes to their poor
           | SEO. The reason they used to be so prevalent in the early
           | 2000s search results is because that's _mostly_ what the web
           | was back then; a bunch of personal websites, blogs, etc.
           | 
           | To do that nowadays would require heavy (manual) curation,
           | which obviously Google isn't interested in.
        
             | joe_the_user wrote:
             | I don't think it's entirely true that the current Google
             | results are merely a matter of decent sites being hard to
             | find. I'm pretty sure that two years ago, Google found a
             | great portion of "real content" pages than it finds today
             | and SEO was already huge then. And Google does a
             | significant amount of human testing right now.
             | 
             | It especially noticeable to me that just in the last few
             | months, Google has changed their algorithm so some product
             | will be the first item on even the most generic search.
        
           | nullc wrote:
           | > For the specific context of "I can find something I've
           | already found",
           | 
           | I find it to be utterly terrible for that too.. even when I
           | have verbatim strings from the thing I'm looking for it often
           | simply doesn't show up. ... often because it rewrites the
           | query into something about Kim Kardashian's butt and no
           | amount of quotes or pluses will make it stop.
        
         | joe_the_user wrote:
         | I disagree about the quality of Google search but I should note
         | this has nothing to do with the utility of Google's vector
         | search library, which is just one low level part of the process
         | of creating a final Google and I'd expect the technical quality
         | here to be excellent by default.
         | 
         | Whether one likes or hates current Google search results, their
         | qualities and the changes from early search processes are
         | clearly intentional and don't relate to how well Google does
         | raw indexing.
        
         | porker wrote:
         | > I can vaguely describe in a sentence the gist of an article
         | I've read, or an image, and the proper result will usually be
         | in the first page.
         | 
         | I can never do that. What I remember is so far from the wording
         | used or how Google identifies the image that it never comes up.
         | I end up scrolling back through my history trying to do it from
         | the page title or domain
        
         | nightpool wrote:
         | Note that this article is careful to never say that this
         | "vector search" technology powers the classic Google Search.
         | This sort of automated classification space is probably _part_
         | of Google 's general search algorithm, but it's probably a very
         | small part. Youtube recommendations (based on description +
         | thumbnail + potentially video content?) and Google Image Search
         | are the two in-practice examples that it focuses on.
        
         | makeset wrote:
         | If you don't know what you're looking for, it works well. If
         | you know exactly what you want, specific keywords etc., good
         | luck because it's gone from the mildly condescending "Did you
         | mean..." corrections to "Yeah I am going to ignore all that, I
         | bet what you really want to find is some ads."
        
         | bitcharmer wrote:
         | I have the exact opposite experience. Search results are
         | nowadays ridden with crap from companies that learned how to
         | game SEO.
         | 
         | If not that, you get scam websites or other ad/malware infested
         | trash.
        
           | radicaldreamer wrote:
           | Wish there was an easy way to remove forbes.com results
        
             | greybeardgeek wrote:
             | you can append -site:forbes.* to your query string
        
             | jiveturkey wrote:
             | https://news.ycombinator.com/item?id=29546433
        
       | Kydlaw wrote:
       | There is a lot done vector search technology right now. I was
       | less fortunate when looking at vector storage. I already looked
       | at Pinecone or Weaviate but they are all paid products.
       | 
       | Is there some people having feedback on this?
        
         | Xenoamorphous wrote:
         | ElasticSearch supports vectors (dense ones, they supported
         | sparse ones at some point but they removed support I think),
         | and has things like cosine similarity functions built in.
         | 
         | Not sure how "free" it is though.
        
         | thirdtrigger wrote:
         | Not true - Weaviate is open source: https://github.com/semi-
         | technologies/weaviate
        
         | gk1 wrote:
         | Pinecone has a free tier that's quite generous, fits around 1M
         | items and will fit even more soon.
         | 
         | Not sure this helps but just mentioning in case.
        
       | tomcooks wrote:
       | I would be happy with "find anything with Google search"
        
         | eliseumds wrote:
         | It exists: https://developers.google.com/custom-
         | search/v1/overview. Might not be cheap for search-as-you-type
         | though.
        
           | sanxiyn wrote:
           | GP is complaining Google's search quality is poor, not
           | inquiring about Google's search API.
        
       | tomc1985 wrote:
       | So... fuzzy logic
       | 
       | Everything old is new again! Again!
        
       | [deleted]
        
       | dorianmariefr wrote:
       | Will probably be available as Postgres extension at some point.
       | Seems like only special indexing of vectors is needed
        
       | Hokusai wrote:
       | It's not very good. I tried different pictures and the results
       | are almost random.
       | 
       | A picture from a cartoon returns from logos to any type of
       | drawing. A picture of a battery returns cars and shops. A picture
       | of food worked as expected and I got more food pictures.
        
       | eob wrote:
       | My 2022 wish list is a Postgres plugin that adds vector + AKNN
       | support that plays well with relational queries. There are so
       | many use cases of that.
       | 
       | I believe Ant Financial has published an open source one but iirc
       | the English language documentation is sparse.
        
         | ccleve wrote:
         | Do you have a link? I'd like to see it.
         | 
         | I googled and did not find much pertaining to "ant financial"
         | and "postgres". Perhaps your google-fu is better than mine...
        
         | etiennedi wrote:
         | Checkout the open source vector search engine Weaviate:
         | https://github.com/semi-technologies/weaviate
         | 
         | It's not a relational db, but it supports Graph-like
         | connections between objects, which makes it really easy to
         | model your relations.
        
           | thirdtrigger wrote:
           | Jup - this is an example from the demo dataset in the docs:
           | https://link.semi.technology/3DPcphe
        
         | akane wrote:
         | Check out pgvector: https://github.com/ankane/pgvector
         | (disclosure: am author)
         | 
         | It uses IVFFlat indexing, but could be extended to support
         | product quantization / ScaNN.
        
       ___________________________________________________________________
       (page generated 2021-12-14 23:00 UTC)