[HN Gopher] Launch HN: Roe AI (YC W24) - AI-powered data warehou...
___________________________________________________________________
Launch HN: Roe AI (YC W24) - AI-powered data warehouse to query
multimodal data
Hey HN, we're Richard and Jason from Roe AI (https://getroe.ai).
We're building a query engine that lets data people do SQL queries
on various kinds of unstructured data (videos, images, webpages,
documents) using LLM-powered data processors. Here is a 3-minute
video: https://www.youtube.com/watch?v=9-WwJk1v5mI, showing how to
create an LLM data processor to process videos, build a semantic
search for image data, and use it with SQL. The problem we tackle
is that data analysts cannot quickly answer their business
questions around unstructured, multimodal data. For example,
product teams want to understand user session replay videos to
understand the painpoints of using their product. Ads teams need to
know everything about an advertiser based on their web pages, such
as the products they offer, payment methods, etc. Marketing teams
need to know how product placement or music in a marketing campaign
could get more views. And so on. For data that is structured,
questions like these can be answered quickly with SQL queries in
Snowflake / BigQuery. But when you have unstructured multimodal
data, it becomes a complex analysis process: open a Python
notebook, write custom logic to get these multimodal data from blob
storage (or write a crawler first if you need webpage data), find
an AI model, do prompt engineering, do data ops to productionize
the workload in a data workflow, etc. We simplify this process to a
few lines of SQL. How it works: first, we leverage multimodal LLMs
as data processors because they're good at unstructured data
information extraction, classification or any arbitrary tasks.
Next, we've built a user interface for data people to explore
multimodal data and manage AI components. Then we have a quick
semantic index builder for multimodal data. (We often see databases
provide vector search functionality but not indexing building, so
we built that.) Utility functions deal with multimodal data, like
video cutter, PDF page selector, etc. Finally, SQL is the command
line for slicing and dicing multimodal data. How we got here: I've
experienced 3 data evolutions in the last 10 years. At UC Berkeley,
I was a data researcher using a supercomputer cluster called Savio.
It was a bare-metal way to analyze the data--I had to move CSV
between machines. Then at LinkedIn, I had Hadoop + Pig / Scala
Spark. That abstracted most of the work, but I spent hours tuning
jobs and had a headache manipulating HDFS directories. Later I
joined Snowflake, and was like, holy - data analysis can be this
simple - I can just use SQL to do everything within this data
warehouse! I asked myself: why can't we make something like
Snowflake for unstructured data? That was the impulse behind Roe.ai
and it's been driving me ever since. To get started, you can sign
in at https://app.roe-ai.com/ and there are docs at
https://docs.roe-ai.com/. You can load unstructured data via our
SQL and File API, Snowflake Staging Data Connector, S3 Blob Storage
Data connector, Zapier Roe AI Zap, or the SQL function
load_url_file() to get a file from a URL. Some logistics: the
product is free to start, and we've preloaded $50 AI credits--
enough to process 3000 one-pager PDFs. If you use all $50, just
email us, and we'll give you more. The solution is not open-sourced
because it is too complex to be self-hosted, but let us know if you
see the potential for open-source. The product is early and could
have bugs and UX problems. It'd be incredible if you could give it
a spin anyway and we hope it will be interesting and that you'll
let us know what you think! Jason and I will be around in the
thread and are really interested in hearing from you!
Author : richardmeng
Score : 33 points
Date : 2024-08-09 15:17 UTC (7 hours ago)
| atak1 wrote:
| This is awesome :) can we use this directly on our entire db?
| richardmeng wrote:
| Likely, can you elaborate on your use case and what db do you
| use?
| airstrike wrote:
| Congrats on the launch. Sounds cool and potentially useful, but I
| don't want to read blog posts or book a demo. I'd put a proper
| video at the very top of the page instead of the animated typing
| you currently have.
|
| FYI your <title> tag needs to be updated.
| richardmeng wrote:
| Good points! We'll update our landing pages as you suggested.
| datadrivenangel wrote:
| Is this more for data engineers or data analysts?
|
| Seems like the type of thing that would be very useful in helping
| build data pipelines on semi-structured data.
| zswzs wrote:
| Right now it's more for data analysts who's data eng team
| doesn't have the capacity to support all types of data
| processing requirements. Data analysts can just do it
| themselves simply with SQL! But we are also open to explore the
| opportunities for the data eng teams if we see a strong use
| case of automating their data pipelines.
| fsndz wrote:
| Why this when I can just use postgreSQL and pgvector ? Like in
| this example I found recently:
| https://www.lycee.ai/courses/91b8b189-729a-471a-8ae1-717033c...
| gigatexal wrote:
| Not saying roe is the next Dropbox but the same sort of thing
| was said when Dropbox did their show HN...
| zswzs wrote:
| Great question! The answer is two fold: 1. Not like a vector
| database, in addition to searching, VolansDB also store the
| files (pointers) directly in the table. So you are able to
| manage files (RBAC etc.) as table cells, apply batch data
| processing jobs easily with SQL, and even unstructured data
| lineage & pipeline. 2. VolansDB is columnar so it's optimized
| for analytical use cases rather than for product DB access
| patterns.
| funnyenough wrote:
| Will this work with Redshift via SQL interface? Or am I looking
| at this wrong?
| richardmeng wrote:
| This does not work with Redshift. This is a query engine for
| unstructured data like documents, images, videos. Those data do
| not quite fit into Redshift / Bigquery data warehouse.
| dmpetrov wrote:
| Bridging the gap between AI and data warehouses is crucial, but
| I'm not sure SQL is the best fit for AI engineers who mainly work
| with Python and AI APIs.
|
| At DataChain, we are solving this by creating a Python API that
| translates to SQL under the hood, which is pretty easy now with
| Pydantic. https://github.com/iterative/datachain
|
| WDYT?
| richardmeng wrote:
| Right, our product is designed for data practitioners who want
| snappy data analytics on unstructured data.
|
| Thanks for sharing your project, super cool idea! What does it
| take if we want to integrate our SQL engine with datachain?
| dmpetrov wrote:
| It uses SQLite in open-source. In SaaS - proprietary data
| warehouses where your engine can be integrated.
| 7thpower wrote:
| You are on to something here. Look forward to seeing this evolve.
| richardmeng wrote:
| Thanks!
___________________________________________________________________
(page generated 2024-08-09 23:00 UTC)