[HN Gopher] Show HN: Velvet - Store OpenAI requests in your own DB
       ___________________________________________________________________
        
       Show HN: Velvet - Store OpenAI requests in your own DB
        
       Hey HN! We're Emma and Chris, founders of Velvet
       (https://www.usevelvet.com).  Velvet proxies OpenAI calls and
       stores the requests and responses in your PostgreSQL database. That
       way, you can analyze logs with SQL (instead of a clunky UI). You
       can also set headers to add caching and metadata (for analysis).
       Backstory: We started by building some more general AI data tools
       (like a text-to-SQL editor). We were frustrated by the lack of
       basic LLM infrastructure, so ended up pivoting to focus on the
       tooling we wanted. So many existing apps, like Helicone, were hard
       to use as power users. We just wanted a database.  Scale: We've
       already warehoused 50m requests for customers, and have optimized
       the platform for scale and latency. We've built the proxy on
       Cloudflare Workers, and latency is nominal. We've built some "yak
       shaving" features that were really complex such as decomposing
       OpenAI Batch API requests so you can track each log individually.
       One of our early customers (https://usefind.ai/) makes millions of
       OpenAI requests per day, up to 1500 requests per second.  Vision:
       We're trying to build development tools that have as little UI as
       possible, that can be controlled entirely with headers and code. We
       also want to blend cloud and on-prem for the best of both worlds --
       allowing for both automatic updates and complete data ownership.
       Here are some things you can do with Velvet logs:  - Observe
       requests, responses, and latency  - Analyze costs by metadata, such
       as user ID  - Track batch progress and speed  - Evaluate model
       changes  - Export datasets for fine-tuning of gpt-4o-mini  (this
       video shows how to do each of those:
       https://www.youtube.com/watch?v=KaFkRi5ESi8)  --  To see how it
       works, try chatting with our demo app that you can use without
       logging in: https://www.usevelvet.com/sandbox  Setting up your own
       proxy is 2 lines of code and takes ~5 mins.  Try it out and let us
       know what you think!
        
       Author : elawler24
       Score  : 11 points
       Date   : 2024-09-24 15:25 UTC (7 hours ago)
        
 (HTM) web link (www.usevelvet.com)
 (TXT) w3m dump (www.usevelvet.com)
        
       | hiatus wrote:
       | This seems to require sharing our data we provide to OpenAI with
       | yet another party. I don't see any zero-retention offering.
        
         | elawler24 wrote:
         | The self-serve version is hosted (it's easy to try locally),
         | but we offer managed deployments where you bring your own DB.
         | In this case your data is 100% yours, in your PostgreSQL.
         | That's how Find AI uses Velvet.
        
       | ramon156 wrote:
       | > we were frustrated by the lack of LLM infrastructure
       | 
       | May I ask what you specifically were frustrated about? Seems like
       | there are more than enough solutions
        
         | elawler24 wrote:
         | There were plenty of UI-based low code platforms. But they
         | required that we adopt new abstractions, use their UI, and log
         | into 5 different tools (logging, observability, analytics,
         | evals, fine-tuning) just to run basic software infra. We didn't
         | feel these would be long-term solutions, and just wanted the
         | data in our own DB.
        
       | OutOfHere wrote:
       | A cache is better when it's local rather than on the web. And I
       | certainly don't need to pay anyone to cache local request
       | responses.
        
       | DeveloperErrata wrote:
       | Seems neat - I'm not sure if you do anything like this but one
       | thing that would be useful with RAG apps (esp at big scales) is
       | vector based search over cache contents. What I mean is that,
       | users can phrase the same question (which has the same answer) in
       | tons of different ways. If I could pass a raw user query into
       | your cache and get back the end result for a previously computed
       | query (even if the current phrasing is a bit different than the
       | current phrasing) then not only would I avoid having to do submit
       | a new OpenAI call, but I could also avoid having to run my entire
       | RAG pipeline. So kind of like a "meta-RAG" system that avoids
       | having to run the actual RAG system for queries that are
       | sufficiently similar to a cached query
        
       ___________________________________________________________________
       (page generated 2024-09-24 23:00 UTC)