[HN Gopher] Launch HN: MinusX (YC S24) - AI assistant for data t...
___________________________________________________________________
Launch HN: MinusX (YC S24) - AI assistant for data tools like
Jupyter/Metabase
Hey HN! We're Vivek, Sreejith and Arpit, and we're building MinusX
(https://minusx.ai), a data science assistant for Jupyter and
Metabase. MinusX is a Chrome extension (https://minusx.ai/chrome-
extension) that adds an AI sidechat to your analytics apps. Given
an instruction, our agent operates your app (by clicking and
typing, just like you would) to analyze data and answer queries.
Broadly, you can do 3 types of things: ask for hypotheses and
explore data, extend existing notebooks/dashboards, or select a
region and ask questions. There's a simple video walkthrough here:
https://www.youtube.com/watch?v=BbHPyX2lJGI. The core idea is to
"upgrade" existing tools, where people already do most of their
data work, rather than building a new platform. I (Vivek) spent 6
years working in various parts of the data stack, from data
analysis at a 1000+ person ride hailing company to research at
comma.ai, where I also handled most of the metrics and dashboarding
infrastructure. The problems with data, surprisingly, were pretty
much the same. Developers and product managers just want answers,
or want to set up a quick view of some metric they care about. They
often don't know which table contains what information, or what
specific secret filters need to be kept in mind to get clean data.
At large companies, analysts/scientists take care of most of these
requests over a thousand back-and-forths. In small companies, most
data projects end up being one-off efforts, and many die midway.
I've tried every new shiny analytics app out there and none of them
fully solve this core issue. New tools also come with a massive
cost: you have to convince everyone around you to move, change all
your workflows and hope the new tool has all features your trusty
old one did. Most people currently go to ChatGPT with barely any
real background context, and admonish the model till it sputters
some useful code, SQL or hypothesis.This is the kind of user we're
trying to help. The philosophy of MinusX mirrors that of comma.
Just as comma is working on "an AI upgrade for your car", we want
to retrofit analytics software with abilities that LLMs have begun
to unlock. We also get a kick out of the fact that we use the same
APIs humans use (clicking and typing), so we don't really need
"permission" from any analytics app (just like comma.ai does not
need permission from Mr Toyota Corolla) :) How it works: Given an
instruction, the MinusX chrome extension first constructs a
simplified representation of the host application's state using the
DOM, and a bunch of application specific cues. We also have a set
of (currently) predefined actions (eg: clicking and typing) that
the agent can use to interact with the host application. Any
"complex action" can be described as a combination of these action-
primitives. We send this entire context, the instruction and the
actions to an LLM. The LLM responds with a sequence of actions
which are executed and the revised state is computed and sent back
to the LLM. This loop terminates when the LLM evaluates that the
desired goals are met. Our architecture allows users to extend the
capabilities of the agent by specifying new actions as combinations
of the action-primitives. We're working on enabling users to do
this through the extension itself. "Retrofitting" is a weird
concept for software, and we've found that it takes a while for
people to grasp what this actually implies. We think, with AI, it
will be more of a thing. Most software we use will be "upgraded"
and not always by the people making the original software. We
ourselves are focused on data analytics because we've worked in and
around data science / data analysis / data engineering all our
careers - working at startups, Google, Meta, etc - and understand
it decently well. But since "retrofitting" can be just as useful
for a bunch of other field-specific software, we're going to open-
source the entire extension and associated plumbing in the near
future. Also, let's be real - a sequence of function calls rammed
through a decision tree does not make any for-loop "agentic". The
reality is that a large amount of in-the-loop data needed for tasks
such as ours does not exist yet! Getting this data flywheel running
is a very exciting axis as well. The product is currently free to
use. In the future, we'll probably charge a monthly subscription
fee, and support local models / bring-your-own-keys. But we're
still working that out. We'd be super stoked for you to try out
MinusX! You can find the extension here: https://minusx.ai/chrome-
extension. We've also created a playground with data, for both
Jupyter and Metabase, so once the extension is installed you can
take it for a spin: https://minusx.ai/playground We'd love to hear
what you think about the idea, and anything else you'd like to
share! Suggestions on which tools to support next are most welcome
:)
Author : nuwandavek
Score : 61 points
Date : 2024-08-20 16:24 UTC (6 hours ago)
| penthi wrote:
| Very cool. Why is the ai so fast? (Impressive)
| ppsreejith wrote:
| We've done a bunch of work to strip down the context and
| minimise the output tokens (which tends to be 100x as slow as
| input tokens). GPT-4o is pretty fast too :)
| penthi wrote:
| Thanks for the explanation. Can't wait to see the code when
| you open it up!
| world2vec wrote:
| This looks cool. Current company uses Metabase extensively and
| this could be handy. What LLM is being used?
| ppsreejith wrote:
| Currently, we're using GPT-4o. We've tested it with Claude as
| well and plan to roll out support soon!
| btown wrote:
| While I'm excited about the launch, I'm concerned that your data
| policies are extremely vague and seem to contain typos and
| missing parentheticals. As of 12:30p ET they say:
|
| > We have nuanced privacy controls on minusx. Any data you share,
| which will be used to train better, more accurate models). We
| never share your data with third parties.
|
| What are these nuanced controls? What data is used to train your
| models? Just column names and existing queries, or data from
| tables and query results as well that might be displayed on
| screen? Are your LLMs running entirely locally on your own
| hardware, and if not, how can you say the data is not shared with
| third parties? (EDIT: you mentioned GPT-4o in another comment so
| this statement cannot be correct.)
|
| https://avanty.app/ is doing something similar in the Metabase
| space and has more clarity on their policies than you do.
|
| Frankly, given the lack of care in your launch FAQs about
| privacy, it's a hard ask to expect that you will treat customer
| data privacy with greater care. There is definitely a need for
| innovation in this space, but I'm unable to recommend or even
| test your product with this status quo.
| nuwandavek wrote:
| I totally share your concerns about data (especially data that
| may be sensitive). We have a simple non-legal-speak privacy
| policy here: https://minusx.ai/privacy-simplified.
|
| > Are your LLMs running entirely locally on your own hardware,
| and if not, how can you say the data is not shared with third
| parties? (EDIT: you mentioned GPT-4o in another comment so this
| statement cannot be correct.)
|
| We're currently only using API providers (OAI + Claude) that do
| not themselves train on data accessed through APIs. Although
| they are technically third parties, they're not third parties
| that harvest data.
|
| I recognize that even this may just be empty talk. We're
| currently working on 2 efforts that I think will further help
| here:
|
| - opensourcing the entire extension so that users can see
| exactly what data is being used as LLM context (and allow users
| to extend the app further)
|
| - support local models so that your data never leaves your
| computer (ETA for both is ~1-2 weeks)
|
| We are genuinely motivated by the excitement + concerns you may
| have. We want to give an assistant-in-the-browser alternative
| to people who don't want to move to AI-native-data-locked-in
| platforms. I regret that was not transparent in our copy.
|
| Thanks for pointing the error in the FAQs, we somehow missed
| it. It is fixed now!
| kshmir wrote:
| What happens when Metabase releases this? (Asking without
| malice!)
| ppsreejith wrote:
| We're building an assistant that works across all your
| analytics apps. This means MinusX can use context from multiple
| apps to better fulfil your instructions. You can imagine a
| future version of MinusX reading data from a spreadsheet,
| putting it onto a Jupyter notebook / Metabase Table, and
| running further analysis.
|
| When Metabase (or any other tool) builds an assistant, we aim
| to use it to further extend MinusX's capabilities!
| altdataseller wrote:
| What other analytics tools do you plan on supporting?
| ppsreejith wrote:
| We're currently exploring the tools displayed on our
| website (Tableau, Grafana, Colab, & Google Sheets). But if
| you have a specific tool in mind, please do tell us at
| https://minusx.ai/tool-request
| sanketsaurav wrote:
| This is impressive! We use Metabase and I've been wanting this
| exact user experience for quite some time. So far, I've been
| dumping our Postgres schema into a Claude project and asking it
| to generate queries. This works surprisingly well, save for the
| tedious copy-paste between the two tabs. The Chrome extension
| workflow makes perfect sense.
|
| Is there a way to select which model is being used? Anecdotally,
| I've found that Claude 3.5 Sonnet works incredibly well with even
| the most complex queries in one shot, which is not something I've
| seen with GPT-4o.
| nuwandavek wrote:
| Haha, yes! We were doing the exact same thing. Also, there is
| so much context you can't capture with just table schema that
| you can if you integrate the extension deep into the tool. It
| also unlocks cross-app contexts (we're working on a way to
| import context from a doc to a metabase query, or from a
| sheet/dashboard to a jupyter notebook etc.
|
| > Is there a way to select which model is being used? Not at
| the moment, but this is in our pipeline! We will enable this
| (and the ability to edit the prompts, etc.) very soon.
|
| Do try it out and let me know what you think!
| __gcd wrote:
| This is very interesting. Can we bring our own API keys? Is that
| in the roadmap?
| nuwandavek wrote:
| Yes! Both bring-your-own-keys and local models are on the
| roadmap. The ETA for both is ~1-2 weeks.
| edmundsauto wrote:
| How does the AI know about things like other tables? Does it have
| some basic knowledge of Metabase's link structure so it can
| navigate to a listing of all tables, then pulls context from
| there for in-context learning while writing the query?
|
| Anecdotally, my hardest problems w/ nl2sql are finding the right
| tables and adding the right filters.
| ppsreejith wrote:
| Yep! MinusX uses Metabase APIs to pull relevant tables, schema,
| & dashboards to construct the context for your instruction.
|
| > Anecdotally, my hardest problems w/ nl2sql are finding the
| right tables and adding the right filters.
|
| Totally! especially in large orgs with thousands of tables.
| Using your existing dashboards and queries, gives useful
| context on picking the right tables for the query.
| zurfer wrote:
| I love that you can take a screenshot and it starts to explain
| what it sees!
|
| While this is clearly an ai analytics assistant your "retrofit"
| approach certainly differentiates you from existing approaches:
| https://github.com/Snowboard-Software/awesome-ai-analytics
|
| Not quite sure if this should be a seperate category? It's more
| similar to the web automation agents like https://www.multion.ai/
| than to https://www.getdot.ai/.
| nuwandavek wrote:
| We love that feature too and use it quite a bit ourselves!
|
| > Not quite sure if this should be a separate category?
|
| We see ourselves at the intersection of generic browser-
| automation agents and generic coding agents. MinusX integrates
| deeply into jupyter/metabase (we had to do a lot of shenanigans
| to get the entire jupyter app context) and has more context
| than RPA agents do today. It is possible that eventually all
| these apps will converge, but we think MinusX will be more
| useful for anything data related than any of them for the
| foreseeable future.
|
| To paraphrase geohot, we think that the path to advanced agents
| runs through specialized, useful intermediaries.
| altdataseller wrote:
| In your demo, you seemed to have performed everything on a small
| dataset.
|
| How's the performance on doing the same analysis on a dataset
| with 1 billion rows for instance?
|
| Also does this work with self hosted Metabase or Metabase Cloud?
| Or both?
| ppsreejith wrote:
| > How's the performance on doing the same analysis on a dataset
| with 1 billion rows for instance?
|
| This really depends on whether your tool can handle the scale.
| We only use a sample of the outputs when constructing the
| context for your instruction so it should be independent of the
| scale of the data. We mostly use metadata such as table names,
| fields, schemas etc to construct the context.
|
| > Also does this work with self hosted Metabase or Metabase
| Cloud? Or both?
|
| Yep, it should work on both :) We have users across both
| KeithBrink wrote:
| Any chance of a Firefox extension?
| ppsreejith wrote:
| As a Firefox user myself, yes! We plan to launch for other
| browsers after open sourcing MinusX (in ~1-2 weeks).
___________________________________________________________________
(page generated 2024-08-20 23:00 UTC)