[HN Gopher] Launch HN: Roe AI (YC W24) - AI-powered data warehou...
       ___________________________________________________________________
        
       Launch HN: Roe AI (YC W24) - AI-powered data warehouse to query
       multimodal data
        
       Hey HN, we're Richard and Jason from Roe AI (https://getroe.ai).
       We're building a query engine that lets data people do SQL queries
       on various kinds of unstructured data (videos, images, webpages,
       documents) using LLM-powered data processors.  Here is a 3-minute
       video: https://www.youtube.com/watch?v=9-WwJk1v5mI, showing how to
       create an LLM data processor to process videos, build a semantic
       search for image data, and use it with SQL. The problem we tackle
       is that data analysts cannot quickly answer their business
       questions around unstructured, multimodal data. For example,
       product teams want to understand user session replay videos to
       understand the painpoints of using their product. Ads teams need to
       know everything about an advertiser based on their web pages, such
       as the products they offer, payment methods, etc. Marketing teams
       need to know how product placement or music in a marketing campaign
       could get more views. And so on.  For data that is structured,
       questions like these can be answered quickly with SQL queries in
       Snowflake / BigQuery. But when you have unstructured multimodal
       data, it becomes a complex analysis process: open a Python
       notebook, write custom logic to get these multimodal data from blob
       storage (or write a crawler first if you need webpage data), find
       an AI model, do prompt engineering, do data ops to productionize
       the workload in a data workflow, etc. We simplify this process to a
       few lines of SQL.  How it works: first, we leverage multimodal LLMs
       as data processors because they're good at unstructured data
       information extraction, classification or any arbitrary tasks.
       Next, we've built a user interface for data people to explore
       multimodal data and manage AI components. Then we have a quick
       semantic index builder for multimodal data. (We often see databases
       provide vector search functionality but not indexing building, so
       we built that.) Utility functions deal with multimodal data, like
       video cutter, PDF page selector, etc. Finally, SQL is the command
       line for slicing and dicing multimodal data.  How we got here: I've
       experienced 3 data evolutions in the last 10 years. At UC Berkeley,
       I was a data researcher using a supercomputer cluster called Savio.
       It was a bare-metal way to analyze the data--I had to move CSV
       between machines. Then at LinkedIn, I had Hadoop + Pig / Scala
       Spark. That abstracted most of the work, but I spent hours tuning
       jobs and had a headache manipulating HDFS directories. Later I
       joined Snowflake, and was like, holy - data analysis can be this
       simple - I can just use SQL to do everything within this data
       warehouse! I asked myself: why can't we make something like
       Snowflake for unstructured data? That was the impulse behind Roe.ai
       and it's been driving me ever since.  To get started, you can sign
       in at https://app.roe-ai.com/ and there are docs at
       https://docs.roe-ai.com/. You can load unstructured data via our
       SQL and File API, Snowflake Staging Data Connector, S3 Blob Storage
       Data connector, Zapier Roe AI Zap, or the SQL function
       load_url_file() to get a file from a URL.  Some logistics: the
       product is free to start, and we've preloaded $50 AI credits--
       enough to process 3000 one-pager PDFs. If you use all $50, just
       email us, and we'll give you more. The solution is not open-sourced
       because it is too complex to be self-hosted, but let us know if you
       see the potential for open-source.  The product is early and could
       have bugs and UX problems. It'd be incredible if you could give it
       a spin anyway and we hope it will be interesting and that you'll
       let us know what you think! Jason and I will be around in the
       thread and are really interested in hearing from you!
        
       Author : richardmeng
       Score  : 33 points
       Date   : 2024-08-09 15:17 UTC (7 hours ago)
        
       | atak1 wrote:
       | This is awesome :) can we use this directly on our entire db?
        
         | richardmeng wrote:
         | Likely, can you elaborate on your use case and what db do you
         | use?
        
       | airstrike wrote:
       | Congrats on the launch. Sounds cool and potentially useful, but I
       | don't want to read blog posts or book a demo. I'd put a proper
       | video at the very top of the page instead of the animated typing
       | you currently have.
       | 
       | FYI your <title> tag needs to be updated.
        
         | richardmeng wrote:
         | Good points! We'll update our landing pages as you suggested.
        
       | datadrivenangel wrote:
       | Is this more for data engineers or data analysts?
       | 
       | Seems like the type of thing that would be very useful in helping
       | build data pipelines on semi-structured data.
        
         | zswzs wrote:
         | Right now it's more for data analysts who's data eng team
         | doesn't have the capacity to support all types of data
         | processing requirements. Data analysts can just do it
         | themselves simply with SQL! But we are also open to explore the
         | opportunities for the data eng teams if we see a strong use
         | case of automating their data pipelines.
        
       | fsndz wrote:
       | Why this when I can just use postgreSQL and pgvector ? Like in
       | this example I found recently:
       | https://www.lycee.ai/courses/91b8b189-729a-471a-8ae1-717033c...
        
         | gigatexal wrote:
         | Not saying roe is the next Dropbox but the same sort of thing
         | was said when Dropbox did their show HN...
        
         | zswzs wrote:
         | Great question! The answer is two fold: 1. Not like a vector
         | database, in addition to searching, VolansDB also store the
         | files (pointers) directly in the table. So you are able to
         | manage files (RBAC etc.) as table cells, apply batch data
         | processing jobs easily with SQL, and even unstructured data
         | lineage & pipeline. 2. VolansDB is columnar so it's optimized
         | for analytical use cases rather than for product DB access
         | patterns.
        
       | funnyenough wrote:
       | Will this work with Redshift via SQL interface? Or am I looking
       | at this wrong?
        
         | richardmeng wrote:
         | This does not work with Redshift. This is a query engine for
         | unstructured data like documents, images, videos. Those data do
         | not quite fit into Redshift / Bigquery data warehouse.
        
       | dmpetrov wrote:
       | Bridging the gap between AI and data warehouses is crucial, but
       | I'm not sure SQL is the best fit for AI engineers who mainly work
       | with Python and AI APIs.
       | 
       | At DataChain, we are solving this by creating a Python API that
       | translates to SQL under the hood, which is pretty easy now with
       | Pydantic. https://github.com/iterative/datachain
       | 
       | WDYT?
        
         | richardmeng wrote:
         | Right, our product is designed for data practitioners who want
         | snappy data analytics on unstructured data.
         | 
         | Thanks for sharing your project, super cool idea! What does it
         | take if we want to integrate our SQL engine with datachain?
        
           | dmpetrov wrote:
           | It uses SQLite in open-source. In SaaS - proprietary data
           | warehouses where your engine can be integrated.
        
       | 7thpower wrote:
       | You are on to something here. Look forward to seeing this evolve.
        
         | richardmeng wrote:
         | Thanks!
        
       ___________________________________________________________________
       (page generated 2024-08-09 23:00 UTC)