[HN Gopher] Show HN: Airweave - Let agents search any app
___________________________________________________________________
Show HN: Airweave - Let agents search any app
Hey HN, we're Lennert and Rauf. We're building Airweave
(https://github.com/airweave-ai/airweave), an open-source tool that
lets agents search and retrieve data from any app or database.
Here's a general intro:
https://www.youtube.com/watch?v=EFI-7SYGQ48, and here's a longer
one that shows more real-world use cases, examples of how Airweave
is used by Cursor (0:33) and Claude desktop (2:04), etc.:
https://youtu.be/p2dl-39HwQo A couple of months ago we were
building agents that interacted with different apps and were
frustrated when they struggled to handle vague natural language
requests like "resolve that one Linear issue about missing auth
configs", "if you get an email from an unsatisfied customer,
reimburse their payment in Stripe", or "what were the returns for
Q1 based on the financials sheet in gdrive?", only to have the
agent inefficiently chain together loads of function calls to find
the data or not find it at all and hallucinate. We also noticed
that despite the rise of MCP creating more desire for agents to
interact with external resources, the majority of agent dev tooling
focused on function calling and actions instead of search. We were
annoyed by the lack of tooling that enabled agents to semantically
search workspace or database contents, so we started building
Airweave first as an internal solution. Then we decided to open-
source it and pursue it full time after we got positive reactions
from coworkers and other agent builders. Airweave connects to
productivity tools, databases, or document stores via their APIs
and transforms their contents into searchable knowledge bases,
accessible through a standardized interface for the agent. The
search interface is exposed via REST or MCP. When using MCP,
Airweave essentially builds a semantically searchable MCP server on
top of the resource. The platform handles the entire data pipeline
from connection and extraction to chunking, embedding, and serving.
To ensure knowledge is current, it has automated sync capabilities,
with configurable schedules and change detection through content
hashing. We built it with support for white-labeled multi-tenancy
to provide OAuth2-based integration across multiple user accounts
while maintaining privacy and security boundaries. We're also
actively working on permission-awareness (i.e., RBAC on the data)
for the platform. So happy to share learnings and get insights
from your experiences. looking forward to comments!
Author : lennertjansen
Score : 101 points
Date : 2025-05-12 15:34 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| reneBond wrote:
| Had meetings with a ton of MCP-server providers, no one came
| close to Airweave's retrieval accuracy. I even tried Zapier and
| similar large companies, didn't come near airweave. Highly highly
| recommend if you need third party integrations to your AI agents
| or workflows. Love the team too, cracked, cool, kind, and always
| there to support their customers (they even took one of their
| customers dog on a walk when they couldn't lol)
| risyachka wrote:
| Noob here - why would mcp providers have a good accuracy?
|
| Don't they just adjust existing apis to mcp protocol basically
| just wrapping them?
| raufakdemir wrote:
| This is exactly the reason we started building Airweave! The
| "context" in MCP is a bit deceiving, as it actually provides
| very little context.
| om0agarwal wrote:
| Looks like a great product! Do you integrate with datalakes
| (Snowflake/Fabric?)
| raufakdemir wrote:
| Hi, co-founder here. No Snowflake or Fabric yet. We do support
| some popular regular SQL connectors. We are working towards an
| async distributed processing architecture that should allow us
| to process >50M row datasets but we're still looking for strong
| usecase signal here. What would you like to do with it?
| rkhanna23 wrote:
| this is so powerful - MCP on steroids :). what are the next use
| cases you're looking to build?
| raufakdemir wrote:
| We're mostly focused on getting this right - better than any
| other tool atm. We are evaluating ideas like mapped RBAC, self-
| updating deep research and other tools for agent builders but
| it should first be very clear to us the devs actually need it
| :D
| ayxliu wrote:
| I was looking everywhere for some solution like this. Finally!
| Curious, do you guys integrate with internal data sources within
| a company?
| bartjanjorna wrote:
| How is this different from regular MCP servers?
| raufakdemir wrote:
| Co-founder here. The platform provides MCP or REST endpoints on
| top of searchable information. The tool is specifically geared
| towards agents that want to perform actions on external systems
| (through an MCP server, for example) but get confused about
| which objects to interact with. Airweave provides a robust
| interface for this.
|
| You can compare it to how coding agents like Cursor work. This
| is the usual pattern you see: - The first step is reading your
| prompt - Then it goes through all the attached files and
| searches your codebase - The last step is to make code file
| edits.
|
| Non-coding agents that use "regular" MCP servers completely
| miss the second part. It's very hard to go from natural
| language instruction, to a chain of API calls that actually
| work and don't end up in hallucination
| valianter wrote:
| Is chat always the best interface for all of these apps? I feel
| like search is the natural first step, but chat-based search has
| been around for a while. Feel like an MCP-based version of
| Glean/Onyx/Moveworks/Dashworks is interesting, but unsure how
| much better it makes the product. Curious to see why your product
| is better
| raufakdemir wrote:
| Co-founder here. The Airweave interface doesn't discriminate
| which downstream use case it's applied in. Most current
| developers don't build it for a chat interface at all actually.
| Instead they fold it into their agents to give them access to
| user data. At first sight enterprise search looks quite
| similar, but instead this is a building block for developers to
| set up integrations for their internal agent / agent product.
| brene wrote:
| Pretty cool stuff. How does it deal with self-hosted data
| sources? can it run inside a VPC and talk to my RDS instances
| directly?
| raufakdemir wrote:
| You can self-host Airweave on Docker or Kubernetes within your
| VPC. We eventually want to move towards AWS/Azure/GCP
| marketplace offerings that should make this easier for you. RDS
| should work - if you get an instance with PSQL/MySQL dialect.
| pomarie wrote:
| Pretty cool - when does it make sense to use this vs n8n?
| raufakdemir wrote:
| n8n is a good example of a tool that Airweave can enhance. n8n
| allows (no-code) developers to set up pre-determined
| automations but as soon as you want to process non-
| deterministic text into action on an app, you will still need a
| way to search the app. Example: you have a n8n workflow that
| gets you on track with Linear tickets. You hook it into a text-
| based human interface in which the user says: "I just created a
| task about database migration on Linear, can you start doing
| the preparations for it?". Airweave can 1. find that damn
| ticket, 2. give additional context on database migrations based
| on what else it finds in the integrated systems.
| nishanthooda wrote:
| Nice - does it have role based access controls built in?
| raufakdemir wrote:
| I assume you're talking about the data layer (not the control
| plane)? We are currently in PoC phase for mapping the role
| graphs from source systems (Asana, Google Drive) to our
| internal role model, but this is still in the works. The way
| developers work around this atm is by configuring a connection
| on a subset of the source info. Example: only make Airweave
| sync info from the `Shared Drive/Marketing/Branding` path
| mike_d wrote:
| How do you handle data retention? For example say that you
| suck in the information of a California resident and the
| company is obligated by law to delete it on request. How do
| you ensure no derivative data exists within your model?
| raufakdemir wrote:
| So you would like to delete information for a specific user
| identifier? Currently that means resyncing excluding that
| user profile (which would have to be removed from the
| source system) but happy to hear more about this use case.
| Would a desired feature be a "delete by user email" for
| example?
| flockonus wrote:
| fyi there is a project with almost phonetic writing name to yours
| - arweave.org
| raufakdemir wrote:
| Lol, good to know. Thanks
| 1317 wrote:
| it's also the name of a mattress
| modelorona wrote:
| Looks cool! How are you thinking about pricing it?
| lennertjansen wrote:
| hi cofounder here. until now it's been custom deployments for
| customers with additional b2b/enterprise features. we're also
| releasing a managed service for a flat fee subscription
| howmayiannoyyou wrote:
| If we want to integrate our SAAS apps into airweave, is there an
| appexchange or directory for doing so?
| raufakdemir wrote:
| Yes, we create service accounts on the source platforms which
| can then be used to do an OAuth or key based integration. What
| would you like to do specifically?
| throwaway314155 wrote:
| Are integrations hooked into via their MCP implementation? Or are
| you hooking in more traditionally and then exposing MCP on top of
| that?
|
| Also, are these one-time/event-based syncs well supported by the
| integration providers? I know for instance that discord (and i
| assume others like slack) frown upon that sort of wholesale
| archival/syncing of entire chat rooms, presumably due to security
| concerns and to maintain their data moats.
|
| Finally (i think), do you have to write custom "diff" logic for
| each integration in order to maintain up-to-date retrieval for
| each one? I assume it would be challenging to keep this accurate
| and well structured across so many different integration
| providers. Is there something i'm missing that makes keeping a
| local backup of your data easier for each service?
|
| All in all, looks very cool. Have starred the repo to mess around
| with tonight.
| raufakdemir wrote:
| Good questions.
|
| 1) the integrations are done traditionally so with REST/SQL.
| The MCP/REST search layer rests on the data that gets synced.
|
| 2) most providers are painless. Slack doesn't want major
| exports in one go but most developers point at a single channel
| anyway so the rate limit errors don't bite too much.
|
| 3) this is all orchestrated by the platform itself. Incremental
| syncs will receive the latest "watermark state" and sync from
| there. Hashes are used to compare data for persist actions
| (update/insert/keep)
___________________________________________________________________
(page generated 2025-05-12 23:00 UTC)