hngopher.com

       [HN Gopher] Show HN: Open-source Deep Research across workplace ...
       ___________________________________________________________________
        
       Show HN: Open-source Deep Research across workplace applications
        
       I've been using deep research on OpenAI and Perplexity and it's
       been just amazing at gathering data across a lot of related and
       chained searches. Just earlier today, I asked "What are some
       marquee tech companies / hot startups (not including the giants
       like FAAMG, Samsung, Nvidia etc.)". It's a pretty involved question
       and looking up "marquee tech startups" or "hot tech startups" on
       Google gave me nothing useful. Deep research on both ChatGPT and
       Perplexity gave really high quality responses with ChatGPT siding
       on slightly larger scaleups and Perplexity siding more on up and
       coming companies.  Given how useful AI research agents are across
       the internet, we decided to build an open-source equivalent for the
       workplace since a ton of questions at work also cannot be easily
       resolved with a single search. Onyx supports deep research
       connected to company applications like Google Drive, Salesforce,
       Sharepoint, GitHub, Slack, and 30+ others.  For example, an
       engineer may want to know "What's happening with the verification
       email failure?" Onyx's AI agent would first figure out what it
       needs to answer this question: What is the cause of the failure,
       what has been done to address it, has this come up before, and
       what's the latest status on the issue. The agent would run parallel
       searches through Confluence, email, Slack, and GitHub to get the
       answers to these then combine them to build a coherent overview. If
       the agent finds that there was a technical blocker that will delay
       the resolution, it will adjust mid-flight and research to get more
       context on the blocker.  Here's a video demo I recorded:
       https://www.youtube.com/watch?v=drvC0fWG4hE  If you want to get
       started with the GitHub repo, you can check out our guides at
       https://docs.onyx.app. Or to play with it without needing to deploy
       anything, you can go to https://cloud.onyx.app/signup  P.S. There's
       a lot of cool technical details behind building a system like this
       so I'll continue the conversation in the comments.
        
       Author : yuhongsun
       Score  : 38 points
       Date   : 2025-03-03 15:18 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | yuhongsun wrote:
       | Before sharing how it works, I want to highlight some of the
       | challenges of a system like this. Unlike deep research over the
       | internet, LLMs aren't able to easily leverage the built in
       | searches of these SaaS applications. They each have different
       | ways of searching for things, many do not have strong search
       | capabilities, or they rely on their internal query language.
       | There are also a ton of other signals that web search engines use
       | that aren't available natively in the tools. Examples include:
       | backlinks, clickthrough rates, etc. Additionally a lot of teams
       | rely on internal terminology that is unique to them and hard for
       | the LLM to search for. There's also the challenge of unifying the
       | objects across all of the apps into a plaintext representation
       | that works for the LLM.
       | 
       | The best way we've found to do this is to build a document index
       | instead of relying on application native searches at query time.
       | The document index is a hybrid index of keyword frequencies and
       | vectors. The keyword component addresses issues like team
       | specific terminology and the vector component allows for natural
       | language queries and non-exact matching. Since all of the
       | documents across the sources are processed prior to query time,
       | inference is fast and all of the documents have already been
       | mapped to an LLM friendly representation.
       | 
       | There are also other signals that we can take into account which
       | are applied across all of the sources. For example, the time that
       | a document was last updated is used to prioritize more recent
       | documents. We also have models that run at indexing time to label
       | documents and models that run at inference time to dynamically
       | change the weights of the search function parameters.
        
       | laborcontract wrote:
       | Cool product. Few Qs:
       | 
       | - What would you say is the agentic approach's special sauce over
       | a typical RAG pipeline, ie query->multi-query
       | generation->HyDE->vector search->bm25
       | search->RRF->rerank->evaluate->(retry|refuse|respond) that
       | differentiates the approach?
       | 
       | - If a user has 20 services connected, how does the agent know
       | how to call/search/traverse the information in the right order?
       | 
       | - Do you have any internal evals on how well the different model
       | affect the overall quality of output, esp for a "deep search"
       | type of task? I have model-picker fatigue.
       | 
       | - Do you plan to implement knowledge graphs in the future?
        
         | janalsncm wrote:
         | If I understand correctly, they are indexing all of the docs
         | together rather than relying on the agent to retrieve them.
        
           | lxe wrote:
           | That sounds like RAG though, right?
        
             | yuhongsun wrote:
             | It's like how OpenAI's deep research works by searching the
             | internet, ours works by searching over our "RAG" system
             | that indexes company documents.
        
         | yuhongsun wrote:
         | Quite a lot to cover here! So in addition to the typical RAG
         | pipeline, we have many other signals like learning from user
         | feedback, time based weighting, metadata handling, weighting
         | between title/content, and different custom deep learning
         | models that run at inference and indexing time all to help the
         | retrieval. But this is all part of the RAG component.
         | 
         | The agent part is the loop of running the LLM over RAG system
         | and letting it decide which questions it wants to explore more
         | (some similarities to retry|refuse|respond I guess?). We also
         | have the model do CoT over its own results including over the
         | subquestions it generates.
         | 
         | Essentially it is the deep research paradigm with some more
         | parallelism and a document index backing it.
         | 
         | How does the agent traverse the information: there are index-
         | free approaches where the LLM has to use the searches of the
         | tools. This gives worse results than approaches that build a
         | coherent index across sources. We use the latter approach. So
         | the search occurs over our index which is a central place for
         | all the knowledge across all connected tools.
         | 
         | Do you have any internal evals on how well the different model
         | affect the overall quality of output, esp for a "deep search"
         | type of task? I have model-picker fatigue: Yes, we have
         | datasets that we use internally. It comprises of "company type
         | data" rather than "web type" data (like short Slack messages,
         | very technical design documents, etc.) comprising about 10K
         | documents and 500 questions.
         | 
         | For which model to use: it was developed primarily against
         | gpt-4o but we retuned the prompts to work with all the recent
         | models like Claude 3.5, Gemini, Deepseek, etc.
         | 
         | Do you plan to implement knowledge graphs in the future? Yes!
         | We're looking into customizing LLM based knowledge graphs like
         | LightGraphRAG (inspired by, but not the same).
        
       | janalsncm wrote:
       | The demo looked sharp but I am curious if you have done any
       | formal evaluation of the quality of the results? For example, MRR
       | and recall@k, even on a toy dataset? Seems like the quality of
       | the generated responses will be highly dependent on the docs
       | which are retrieved.
        
         | robrenaud wrote:
         | I am also interested in how to do eval on an open source
         | corporate search system. Privacy and information security make
         | this challenging, right?
        
           | yuhongsun wrote:
           | On privacy and security, we are the only option (as far as I
           | know) that you can connect up to all your company internal
           | docs and have it be all processed locally to the deployment
           | and stored at rest within the deployment.
           | 
           | So basically you can have it completely airgapped from the
           | outside world, the only tough part is the local LLM but there
           | are lots of options for that these days.
        
       | monkeydust wrote:
       | I have recently found myself drawing parallels between deep
       | research and knowledge working org structures.
       | 
       | For example,board exec asks senior exec a question about a
       | particular product. The senior exec then has to fire off emails
       | to say 5 managers, who might go down their tree to ICs, all the
       | info is gathered and synthesised into a response.
       | 
       | Normally this response takes into account some angle the senior
       | exec might have.
       | 
       | A lot of knowledge working tasks follow this pattern that I have
       | somewhat simplified.
        
       | nikisweeting wrote:
       | Very cool! One question: how do you handle permissions?
       | 
       | Different apps have different permissions models, not everyone is
       | allowed to see everything. Do you attempt to model this
       | complexity at all or normalize it to some general permissions
       | model?
        
         | thebeardisred wrote:
         | It appears the answer (if you want differentiated permissions,
         | e.g. user vs admin, or role based access control) is "purchase
         | the enterprise edition" -
         | https://docs.onyx.app/enterprise_edition/overview
         | 
         | edit: added clarification
        
         | yuhongsun wrote:
         | This is a large challenge in itself actually. Every external
         | tool has it's own framework for permissions (necessarily so).
         | 
         | For example, Google Drive docs have permissions like "global
         | public", "domain public", "private" where "private" is shared
         | with users and groups and there's also the document owner.
         | 
         | Slack has public channels, private channels, DMs, group DMs.
         | 
         | So we need to map these external objects and their external
         | users/groups into a unified representation within Onyx.
         | 
         | Then there are additional challenges like rate limiting so we
         | cannot poll at subsecond intervals.
         | 
         | The way that we do it is we have async jobs that check for
         | object permission updates and group/user updates against the
         | external sources at a configurable frequency (with defaults
         | that depend on the external source type).
         | 
         | Of course, always failing-closed instead of failing-open and
         | defaulting to least permissive.
        
       | vishnudeva wrote:
       | Onyx is as close as to magic you can get in this space. It just.
       | Works.
       | 
       | I can talk for literally hours about how good it is when you
       | connect it to your company's Confluence or Jira or Slack or
       | Google Drive or a ton of other things. At a scale of many tens of
       | thousands of documents too.
       | 
       | Their team is awesome too and completely tuned into exactly what
       | their users need. And that it's open source is the cherry on top.
       | No secret in how your data is being used.
       | 
       | - An incredibly happy user looking forward to more from Onyx
        
         | yuhongsun wrote:
         | Amazing to hear from a happy user! Thanks for the kind words!
        
       | lxe wrote:
       | What's the difference between deep research and RAG?
        
       | canadiantim wrote:
       | How is the data stored? E.g. for concerns about internal data
       | leaking out?
        
       ___________________________________________________________________
       (page generated 2025-03-04 23:00 UTC)