[HN Gopher] Vector search just got up to 10x faster and vertical...
___________________________________________________________________
Vector search just got up to 10x faster and vertically scalable
Author : gk1
Score : 34 points
Date : 2022-08-16 19:45 UTC (3 hours ago)
(HTM) web link (www.pinecone.io)
(TXT) w3m dump (www.pinecone.io)
| phgn wrote:
| Side tangent: Pinecone pods seem to cost 15.625% more per hour on
| AWS compared to GCP.
|
| All the instance types hide away the price differences usually,
| so this is interesting to see.
|
| Edit: also there is no free tier for Pinecone on AWS :(
| [deleted]
| mammydady wrote:
| opisthenar84 wrote:
| I don't understand the emphasis here on vertical scaling. Move a
| database to a bigger machine = more storage and faster querying.
| Not exactly rocket science. Horizontal scaling is the real
| challenge here, and the complexity of vector indexes makes it
| especially challenging. Milvus and Vertex AI both have horizontal
| scaling ANN search and the ability to do parallel indexing as
| well. I appreciate the post but this doesn't seem worthy of an
| announcement.
| ww520 wrote:
| Bigger machine doesn't automatically mean higher performance.
| The code needs to scale with the increased number of cores, has
| share-nothing or share-very-little approach to avoid
| contention, and uses efficient data structure to utilize the
| increased memory.
| fdgsdfogijq wrote:
| Wish I could raise 10M and use it to wrap open source libraries
| developed by FB and Google
| [deleted]
| MasterIdiot wrote:
| If you really think that's enough to build a real product, go
| for it. Even open-source companies (Elastic, Mongo, Scylla)
| have to build tons of infra around their core codebase in order
| to make it an actual cloud product.
| fdgsdfogijq wrote:
| Not that easy, the founder was a director at AWS. This is
| just devops/obfuscation on top of an open source library:
|
| FAISS
| gk1 wrote:
| Pinecone doesn't use Faiss, nor ScaNN. We love Faiss and
| even teach people to use it[1]. There happens to be a
| sizable population of engineers who need more than what
| Faiss provides (like live index updates and metadata
| filtering, for example), and can't be bothered or aren't
| being paid to customize and manage open-source libraries
| all day.
|
| [1] https://www.pinecone.io/learn/faiss/
| fdgsdfogijq wrote:
| So you guys developed and implemented state of the art
| neural network vector search from scratch? in a year? and
| something better than libraries with tens of contributors
| over years of research?
| noogle wrote:
| I actually built a similar solution supporting similar
| operations (including filtering by meta-data) using open-
| source libraries. Took me about 2 weeks net.
|
| I can see a clientele for such database (people who want a
| turnkey solution), but honestly it looks like an attempt to
| use a dev-ops solution to address deeper issues with
| problem formulation: e.g.
|
| 1. Is there really a need to search all items in the
| database? can subsampling make simple similarity comparison
| feasible?
|
| 2. Do the embeddings really need to have that many
| dimensions? Can we reduce their dimensionality and fit them
| in RAM?
|
| 3. Is embedding accurate enough compared to pairwise
| comparison? Can we formulate the problem to make the latter
| feasible?
|
| I also could not find any explanation of the underlying
| algorithms, especially around meta-data filtering, which is
| not solved by FAISS as well as their accuracy. (happy to
| hear otherwise)
| etaioinshrdlu wrote:
| Which open source libraries is pinecone wrapping?
| gk1 wrote:
| I'm not sure where the other commenter gets their confidence,
| but Pinecone is not wrapping any open source vector-search
| library. We offer three index types (in-memory, in-memory
| graph-based, hybrid memory + disk), and all are proprietary.
|
| We do have articles about Faiss and HNSW and all sorts of
| other vector-search and NLP topics, so it's possible that's
| where the confusion comes from.
| fdgsdfogijq wrote:
| FAISS
| gk1 wrote:
| This is incorrect.
| learndeeply wrote:
| Didn't Milvus (vector db, wrapper around FAISS) come before
| Pinecone?
| fzliu wrote:
| Just to clarify, Milvus is much more than a wrapper around
| FAISS. Our vector search component called Knowhere
| (https://github.com/milvus-io/knowhere) utilizes FAISS and
| Annoy and will soon include ScaNN, DiskANN, and in-house vector
| indexes as well. Milvus uses Knowhere as the compute engine,
| and implements a variety of database functions such as
| horizontal scaling, caching, replication, failover, and object
| storage on top of Knowhere. If you're interested, I recommend
| checking out our architecture page
| (https://milvus.io/docs/architecture_overview.md).
|
| [EDIT]: Forgot to mention - Milvus development began in 2018
| was open sourced in 2019.
| gk1 wrote:
| For anyone interested in the code walkthrough:
| https://www.pinecone.io/learn/testing-p2-collections-scaling...
| baobob wrote:
| Confused by their claim to be the 'first' vector database. These
| things have been around forever? For example FLANN (not a DB
| server, but example lib) is from 2009
| opisthenar84 wrote:
| I think the difference is in the layer of abstraction i.e.
| FLANN is just the underlying search functionality whereas
| vector databases are fully managed solutions. Even so, Weaviate
| came out in 2018, so saying that they are the "first" vector
| database is just flat out wrong since Pinecone was founded in
| 2019.
| gk1 wrote:
| Weaviate calling themselves a vector database is a fairly new
| thing.
| opisthenar84 wrote:
| The fact that Weaviate only recently started calling
| themselves a vector database is completely irrelevant here.
| They had this type of vector data infrastructure before
| Pinecone did, and that's all that matters.
|
| Example: I'm going to start a new company called
| Conifercone and do pretty much exactly what you do, but
| call it a "vector datastore" instead. Apparently I've now
| created the first ever vector datastore even though
| functionally I have done nothing novel.
| 29athrowaway wrote:
| 10x faster with respect to what?
___________________________________________________________________
(page generated 2022-08-16 23:00 UTC)