[HN Gopher] Launch HN: Panora (YC S24) - Data Integration API fo...
       ___________________________________________________________________
        
       Launch HN: Panora (YC S24) - Data Integration API for LLMs
        
       Hey HN! We're Nael and Rachid, and we're building Panora
       (https://github.com/panoratech/Panora), an open-source API that
       connects various data sources to LLMs, from 3rd party integrations
       to embeddings and chunking generation.  Here's a demo:
       https://www.youtube.com/watch?v=45QaN8mzAfg, and you can check our
       docs here: https://docs.panora.dev/quick-start  Our GitHub repo is
       at https://github.com/panoratech/Panora.  Building integrations by
       hand is tedious and time-consuming. You must adapt to API
       documentation quirks, manage request retries, OAuth/API key
       authorization, refresh tokens, rate limits, and data sync
       freshness. Moreover, you have to keep up with the constant rise of
       embedding models and chunking capabilities. On the other hand, with
       the rise of AI-powered apps, you have to handle embedding and
       chunking of all the unstructured data.  The dominant player in this
       space is Merge.dev, but it has several drawbacks:  1. It's a black
       box for most developers, lacking transparency on data handling. 2.
       Strong vendor lock-in: once an end-user connects their software,
       it's challenging to access authorization tokens if you want to
       perform requests on their behalf after leaving Merge. 3. Long time-
       to-deploy for the long tail of integrations, leading to lost
       opportunities as integrations become the backbone of LLM-based
       applications. 4. Unrealistic prices per connection (action of one
       end-user connecting their tool). 5. Not positioned to serve LLM-
       based products that need RAG-ready data to power their use cases.
       That's how Panora was born. We set out to build a solution that
       addresses these pain points head-on, creating something that is
       both developer-friendly and open-source. Our goal was to simplify
       the complex world of integrations and data preparation for LLMs,
       allowing developers to focus on building great products rather than
       wrestling with integration headaches.  Panora is 100% open-source
       under the Apache 2.0 license and you can either use our cloud
       version or self-host the product.  We provide two ways for your
       end-users to connect their software seamlessly.  1. A frontend SDK
       (React) where you can embed the integrations catalog within your
       app.  2. A magic link that you can share with anyone allowing them
       to connect their software.  You can either use your own OAuth
       clients or our managed ones. You receive a connection token per
       user and per provider connected, which you must use to
       retrieve/insert data using our universal API.  We have different
       categories of software such as CRMs or File storage. Every category
       is divided into entities (e.g: File Storage has File, Folder,
       Drive, Group & User) following a standard data model. You even have
       access to remote data (non-transformed data from the provider)
       within each response, so you can build custom & complex
       integrations on your end.  If the remote data isn't enough beyond
       the standard data model, you can create custom fields either via
       API or our dashboard to map your remote fields to our model.  We're
       more than just integrations--we provide ready data for your RAG
       applications with auto-generation of embeddings and chunks for all
       your synced documents. You have the option to select your own
       vector database and embedding model in the dashboard. We then sync
       your documents and store the chunks/embeddings to the specified
       vector DB.  We make sure to maintain up-to-date data that we send
       through webhooks, and you can set custom sync frequency (1hr, once
       a day, etc.) depending on your use case.  Developers use our API to
       access fragmented data across various software such as File storage
       systems (Google Drive, OneDrive, SharePoint) and retrieve the
       embeddings of their documents using a single API. Our backend SDK
       is available for Python, TypeScript, Ruby, and Go.  Your honest
       feedback, suggestions, and wishes would be very helpful. We'd love
       to hear about your integration stories, challenges you've faced
       with data integration for LLMs, and any thoughts on our approach.
       Thanks, HN!
        
       Author : nael_ob
       Score  : 89 points
       Date   : 2024-09-23 16:43 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | gavmor wrote:
       | ctrl + f "tool calling"
       | 
       | ctrl + f "function calling"
       | 
       | Have these terms already become passe, or not yet caught on? Or
       | are they an implementation detail which Panora seeks to
       | gracefully elide?
       | 
       | Edit: Oh, very cool, though. I'm envious, in fact.
        
         | nael_ob wrote:
         | We first wanted to let ppl handle it since our API already
         | provides the necessary abstraction to extract/write data, then
         | doing "function calling" is just a matter of plugging the right
         | API calls. Really curious to have your thoughts on whether that
         | should be something we'll have to expose as well.
        
       | tayloramurphy wrote:
       | The other open source option for this that I'm familiar with is
       | Nango[0]. How are you different?
       | 
       | Also, a big challenge in this space is pricing. How are you
       | thinking about tackling that?
       | 
       | [0] https://github.com/nangoHQ/nango
        
         | nael_ob wrote:
         | Yes they built a cool product! Actually, we aim to focus on
         | companies feeding their LLMs by providing embeddings and
         | chunkings out of the box on top of all the data we sync. We
         | don't only help you connect with 3rd parties but also receive
         | data that can be interpreted for AI use cases (e.g: RAG).
        
           | thenaturalist wrote:
           | The optimal chunking strategy is often highly, highly
           | dependent on the data used and questions to be answered.
           | 
           | The net is plastered with blog posts about optimal
           | strategies, of which there seem to be more than 10 and new
           | approaches popping up often.
           | 
           | It seems consensus that trial and error is the way to go to
           | optimize cost and performance.
           | 
           | How do you plan to tackle this when providing it out of the
           | box?
        
             | nael_ob wrote:
             | That's why we wanted to try the OSS approach where
             | contributors can help keep up with the optimal strategy. We
             | also plan to build an engine to test each strategy and
             | compare retrieval perf before choosing one at runtime.
        
         | rflih96 wrote:
         | Hey - for pricing, we're going usage based on two metrics :
         | amount of third-party connections and volume of data
         | transformed (for chunking / embedding). Ps: This will evolve in
         | the next months probably!
        
       | zkid18 wrote:
       | any differences from nango or supaglue?
        
         | nael_ob wrote:
         | Yes, we aim to focus on companies with their LLMs by providing
         | embeddings and chunkings out of the box on top of all the data
         | we sync across different software.
        
           | zkid18 wrote:
           | sounds like a neat use-case!
        
       | thelittleone wrote:
       | Sound's like your hosted version will end up with a lot of
       | potentially sensitive information. You will probably want to add
       | ISO 27001 and / or SOC 2 Type 2 as a priority. Not to say an org
       | with that is more secure than one without, but you will certainly
       | need to evidence a comprehensive security program to pass
       | procurement. Choosing what third parties you add now (libraries,
       | platforms etc) can save you a TON down the road.
        
         | nael_ob wrote:
         | Yes you're right. SOC 2 is our priority for the next few weeks.
         | If you have experience in enterprise sales, I'd love to chat
         | (nael@panora.dev). Thanks.
        
       | swyx wrote:
       | as a former yc data integration startup employee... this is a
       | very very very challenging business. i'd consider a different
       | business if i were you. being "x but open source" isn't all it is
       | cracked up to be. i dont really care enough to elaborate but
       | please do feel free to tell me if i am wrong in five years.
        
         | nael_ob wrote:
         | I must ask you some arguments... it'll be helpful to have your
         | detailed thoughts
        
         | undefinedblog wrote:
         | i'm with all my ears, please share your thoughts!
        
       | ilrwbwrkhv wrote:
       | This is one of the better open source but business projects as it
       | has running it on my machine front and center. I still don't know
       | why you are tying yourself with VCs but well done.
        
         | nael_ob wrote:
         | Thank you!
        
       ___________________________________________________________________
       (page generated 2024-09-24 23:01 UTC)