[HN Gopher] Launch HN: MinusX (YC S24) - AI assistant for data t...
       ___________________________________________________________________
        
       Launch HN: MinusX (YC S24) - AI assistant for data tools like
       Jupyter/Metabase
        
       Hey HN! We're Vivek, Sreejith and Arpit, and we're building MinusX
       (https://minusx.ai), a data science assistant for Jupyter and
       Metabase. MinusX is a Chrome extension (https://minusx.ai/chrome-
       extension) that adds an AI sidechat to your analytics apps. Given
       an instruction, our agent operates your app (by clicking and
       typing, just like you would) to analyze data and answer queries.
       Broadly, you can do 3 types of things: ask for hypotheses and
       explore data, extend existing notebooks/dashboards, or select a
       region and ask questions. There's a simple video walkthrough here:
       https://www.youtube.com/watch?v=BbHPyX2lJGI. The core idea is to
       "upgrade" existing tools, where people already do most of their
       data work, rather than building a new platform.  I (Vivek) spent 6
       years working in various parts of the data stack, from data
       analysis at a 1000+ person ride hailing company to research at
       comma.ai, where I also handled most of the metrics and dashboarding
       infrastructure. The problems with data, surprisingly, were pretty
       much the same. Developers and product managers just want answers,
       or want to set up a quick view of some metric they care about. They
       often don't know which table contains what information, or what
       specific secret filters need to be kept in mind to get clean data.
       At large companies, analysts/scientists take care of most of these
       requests over a thousand back-and-forths. In small companies, most
       data projects end up being one-off efforts, and many die midway.
       I've tried every new shiny analytics app out there and none of them
       fully solve this core issue. New tools also come with a massive
       cost: you have to convince everyone around you to move, change all
       your workflows and hope the new tool has all features your trusty
       old one did. Most people currently go to ChatGPT with barely any
       real background context, and admonish the model till it sputters
       some useful code, SQL or hypothesis.This is the kind of user we're
       trying to help.  The philosophy of MinusX mirrors that of comma.
       Just as comma is working on "an AI upgrade for your car", we want
       to retrofit analytics software with abilities that LLMs have begun
       to unlock. We also get a kick out of the fact that we use the same
       APIs humans use (clicking and typing), so we don't really need
       "permission" from any analytics app (just like comma.ai does not
       need permission from Mr Toyota Corolla) :)  How it works: Given an
       instruction, the MinusX chrome extension first constructs a
       simplified representation of the host application's state using the
       DOM, and a bunch of application specific cues. We also have a set
       of (currently) predefined actions (eg: clicking and typing) that
       the agent can use to interact with the host application. Any
       "complex action" can be described as a combination of these action-
       primitives. We send this entire context, the instruction and the
       actions to an LLM. The LLM responds with a sequence of actions
       which are executed and the revised state is computed and sent back
       to the LLM. This loop terminates when the LLM evaluates that the
       desired goals are met. Our architecture allows users to extend the
       capabilities of the agent by specifying new actions as combinations
       of the action-primitives. We're working on enabling users to do
       this through the extension itself.  "Retrofitting" is a weird
       concept for software, and we've found that it takes a while for
       people to grasp what this actually implies. We think, with AI, it
       will be more of a thing. Most software we use will be "upgraded"
       and not always by the people making the original software.  We
       ourselves are focused on data analytics because we've worked in and
       around data science / data analysis / data engineering all our
       careers - working at startups, Google, Meta, etc - and understand
       it decently well. But since "retrofitting" can be just as useful
       for a bunch of other field-specific software, we're going to open-
       source the entire extension and associated plumbing in the near
       future.  Also, let's be real - a sequence of function calls rammed
       through a decision tree does not make any for-loop "agentic". The
       reality is that a large amount of in-the-loop data needed for tasks
       such as ours does not exist yet! Getting this data flywheel running
       is a very exciting axis as well.  The product is currently free to
       use. In the future, we'll probably charge a monthly subscription
       fee, and support local models / bring-your-own-keys. But we're
       still working that out.  We'd be super stoked for you to try out
       MinusX! You can find the extension here: https://minusx.ai/chrome-
       extension. We've also created a playground with data, for both
       Jupyter and Metabase, so once the extension is installed you can
       take it for a spin: https://minusx.ai/playground  We'd love to hear
       what you think about the idea, and anything else you'd like to
       share! Suggestions on which tools to support next are most welcome
       :)
        
       Author : nuwandavek
       Score  : 61 points
       Date   : 2024-08-20 16:24 UTC (6 hours ago)
        
       | penthi wrote:
       | Very cool. Why is the ai so fast? (Impressive)
        
         | ppsreejith wrote:
         | We've done a bunch of work to strip down the context and
         | minimise the output tokens (which tends to be 100x as slow as
         | input tokens). GPT-4o is pretty fast too :)
        
           | penthi wrote:
           | Thanks for the explanation. Can't wait to see the code when
           | you open it up!
        
       | world2vec wrote:
       | This looks cool. Current company uses Metabase extensively and
       | this could be handy. What LLM is being used?
        
         | ppsreejith wrote:
         | Currently, we're using GPT-4o. We've tested it with Claude as
         | well and plan to roll out support soon!
        
       | btown wrote:
       | While I'm excited about the launch, I'm concerned that your data
       | policies are extremely vague and seem to contain typos and
       | missing parentheticals. As of 12:30p ET they say:
       | 
       | > We have nuanced privacy controls on minusx. Any data you share,
       | which will be used to train better, more accurate models). We
       | never share your data with third parties.
       | 
       | What are these nuanced controls? What data is used to train your
       | models? Just column names and existing queries, or data from
       | tables and query results as well that might be displayed on
       | screen? Are your LLMs running entirely locally on your own
       | hardware, and if not, how can you say the data is not shared with
       | third parties? (EDIT: you mentioned GPT-4o in another comment so
       | this statement cannot be correct.)
       | 
       | https://avanty.app/ is doing something similar in the Metabase
       | space and has more clarity on their policies than you do.
       | 
       | Frankly, given the lack of care in your launch FAQs about
       | privacy, it's a hard ask to expect that you will treat customer
       | data privacy with greater care. There is definitely a need for
       | innovation in this space, but I'm unable to recommend or even
       | test your product with this status quo.
        
         | nuwandavek wrote:
         | I totally share your concerns about data (especially data that
         | may be sensitive). We have a simple non-legal-speak privacy
         | policy here: https://minusx.ai/privacy-simplified.
         | 
         | > Are your LLMs running entirely locally on your own hardware,
         | and if not, how can you say the data is not shared with third
         | parties? (EDIT: you mentioned GPT-4o in another comment so this
         | statement cannot be correct.)
         | 
         | We're currently only using API providers (OAI + Claude) that do
         | not themselves train on data accessed through APIs. Although
         | they are technically third parties, they're not third parties
         | that harvest data.
         | 
         | I recognize that even this may just be empty talk. We're
         | currently working on 2 efforts that I think will further help
         | here:
         | 
         | - opensourcing the entire extension so that users can see
         | exactly what data is being used as LLM context (and allow users
         | to extend the app further)
         | 
         | - support local models so that your data never leaves your
         | computer (ETA for both is ~1-2 weeks)
         | 
         | We are genuinely motivated by the excitement + concerns you may
         | have. We want to give an assistant-in-the-browser alternative
         | to people who don't want to move to AI-native-data-locked-in
         | platforms. I regret that was not transparent in our copy.
         | 
         | Thanks for pointing the error in the FAQs, we somehow missed
         | it. It is fixed now!
        
       | kshmir wrote:
       | What happens when Metabase releases this? (Asking without
       | malice!)
        
         | ppsreejith wrote:
         | We're building an assistant that works across all your
         | analytics apps. This means MinusX can use context from multiple
         | apps to better fulfil your instructions. You can imagine a
         | future version of MinusX reading data from a spreadsheet,
         | putting it onto a Jupyter notebook / Metabase Table, and
         | running further analysis.
         | 
         | When Metabase (or any other tool) builds an assistant, we aim
         | to use it to further extend MinusX's capabilities!
        
           | altdataseller wrote:
           | What other analytics tools do you plan on supporting?
        
             | ppsreejith wrote:
             | We're currently exploring the tools displayed on our
             | website (Tableau, Grafana, Colab, & Google Sheets). But if
             | you have a specific tool in mind, please do tell us at
             | https://minusx.ai/tool-request
        
       | sanketsaurav wrote:
       | This is impressive! We use Metabase and I've been wanting this
       | exact user experience for quite some time. So far, I've been
       | dumping our Postgres schema into a Claude project and asking it
       | to generate queries. This works surprisingly well, save for the
       | tedious copy-paste between the two tabs. The Chrome extension
       | workflow makes perfect sense.
       | 
       | Is there a way to select which model is being used? Anecdotally,
       | I've found that Claude 3.5 Sonnet works incredibly well with even
       | the most complex queries in one shot, which is not something I've
       | seen with GPT-4o.
        
         | nuwandavek wrote:
         | Haha, yes! We were doing the exact same thing. Also, there is
         | so much context you can't capture with just table schema that
         | you can if you integrate the extension deep into the tool. It
         | also unlocks cross-app contexts (we're working on a way to
         | import context from a doc to a metabase query, or from a
         | sheet/dashboard to a jupyter notebook etc.
         | 
         | > Is there a way to select which model is being used? Not at
         | the moment, but this is in our pipeline! We will enable this
         | (and the ability to edit the prompts, etc.) very soon.
         | 
         | Do try it out and let me know what you think!
        
       | __gcd wrote:
       | This is very interesting. Can we bring our own API keys? Is that
       | in the roadmap?
        
         | nuwandavek wrote:
         | Yes! Both bring-your-own-keys and local models are on the
         | roadmap. The ETA for both is ~1-2 weeks.
        
       | edmundsauto wrote:
       | How does the AI know about things like other tables? Does it have
       | some basic knowledge of Metabase's link structure so it can
       | navigate to a listing of all tables, then pulls context from
       | there for in-context learning while writing the query?
       | 
       | Anecdotally, my hardest problems w/ nl2sql are finding the right
       | tables and adding the right filters.
        
         | ppsreejith wrote:
         | Yep! MinusX uses Metabase APIs to pull relevant tables, schema,
         | & dashboards to construct the context for your instruction.
         | 
         | > Anecdotally, my hardest problems w/ nl2sql are finding the
         | right tables and adding the right filters.
         | 
         | Totally! especially in large orgs with thousands of tables.
         | Using your existing dashboards and queries, gives useful
         | context on picking the right tables for the query.
        
       | zurfer wrote:
       | I love that you can take a screenshot and it starts to explain
       | what it sees!
       | 
       | While this is clearly an ai analytics assistant your "retrofit"
       | approach certainly differentiates you from existing approaches:
       | https://github.com/Snowboard-Software/awesome-ai-analytics
       | 
       | Not quite sure if this should be a seperate category? It's more
       | similar to the web automation agents like https://www.multion.ai/
       | than to https://www.getdot.ai/.
        
         | nuwandavek wrote:
         | We love that feature too and use it quite a bit ourselves!
         | 
         | > Not quite sure if this should be a separate category?
         | 
         | We see ourselves at the intersection of generic browser-
         | automation agents and generic coding agents. MinusX integrates
         | deeply into jupyter/metabase (we had to do a lot of shenanigans
         | to get the entire jupyter app context) and has more context
         | than RPA agents do today. It is possible that eventually all
         | these apps will converge, but we think MinusX will be more
         | useful for anything data related than any of them for the
         | foreseeable future.
         | 
         | To paraphrase geohot, we think that the path to advanced agents
         | runs through specialized, useful intermediaries.
        
       | altdataseller wrote:
       | In your demo, you seemed to have performed everything on a small
       | dataset.
       | 
       | How's the performance on doing the same analysis on a dataset
       | with 1 billion rows for instance?
       | 
       | Also does this work with self hosted Metabase or Metabase Cloud?
       | Or both?
        
         | ppsreejith wrote:
         | > How's the performance on doing the same analysis on a dataset
         | with 1 billion rows for instance?
         | 
         | This really depends on whether your tool can handle the scale.
         | We only use a sample of the outputs when constructing the
         | context for your instruction so it should be independent of the
         | scale of the data. We mostly use metadata such as table names,
         | fields, schemas etc to construct the context.
         | 
         | > Also does this work with self hosted Metabase or Metabase
         | Cloud? Or both?
         | 
         | Yep, it should work on both :) We have users across both
        
       | KeithBrink wrote:
       | Any chance of a Firefox extension?
        
         | ppsreejith wrote:
         | As a Firefox user myself, yes! We plan to launch for other
         | browsers after open sourcing MinusX (in ~1-2 weeks).
        
       ___________________________________________________________________
       (page generated 2024-08-20 23:00 UTC)