[HN Gopher] Show HN: Plexe - ML Models from a Prompt
___________________________________________________________________
Show HN: Plexe - ML Models from a Prompt
Hey HN! We're Vaibhav and Marcello. We're building Plexe
(https://github.com/plexe-ai/plexe), an open-source agent that
turns natural language task descriptions into trained ML models.
Here's a video walkthrough:
https://www.youtube.com/watch?v=bUwCSglhcXY. There are all kinds
of uses for ML models that never get realized because the process
of making them is messy and convoluted. You can spend months trying
to find the data, clean it, experiment with models and deploy to
production, only to find out that your project has been binned for
taking so long. There are many tools for "automating" ML, but it
still takes teams of ML experts to actually productionize something
of value. And we can't keep throwing LLMs at every ML problem. Why
use a generic 10B parameter language model, if a logistic
regression trained on your data could do the job better? Our
light-bulb moment was that we could use LLMs to generate task-
specific ML models that would be trained on one's own data. Thanks
to the emergent reasoning ability of LLMs, it is now possible to
create an agentic system that might automate most of the ML
lifecycle. A couple of months ago, we started developing a Python
library that would let you define ML models on structured data
using a description of the expected behaviour. Our initial
implementation arranged potential solutions into a graph, using
LLMs to write plans, implement them as code, and run the resulting
training script. Using simple search algorithms, the system
traversed the solution space to identify and package the best
model. However, we ran into several limitations, as the algorithm
proved brittle under edge cases, and we kept having to put patches
for every minor issue in the training process. We decided to
rethink the approach, throw everything out, and rebuild the tool
using an agentic approach prioritising generality and flexibility.
What started as a single ML engineering agent turned into an
agentic ML "team", with all experiments tracked and logged using
MLFlow. Our current implementation uses the smolagents library to
define an agent hierarchy. We mapped the functionality of our
previous implementation to a set of specialized agents, such as an
"ML scientist" that proposes solution plans, and so on. Each agent
has specialized tools, instructions, and prompt templates. To
facilitate cross-agent communication, we implemented a shared
memory that enables objects (datasets, code snippets, etc) to be
passed across agents indirectly by referencing keys in a registry.
You can find a detailed write-up on how it works here:
https://github.com/plexe-ai/plexe/blob/main/docs/architectur...
Plexe's early release is focused on predictive problems over
structured data, and can be used to build models such as
forecasting player injury risk in high-intensity sports, product
recommendations for an e-commerce marketplace, or predicting
technical indicators for algorithmic trading. Here are some
examples to get you started: https://github.com/plexe-
ai/plexe/tree/main/examples To get it working on your data, you
can dump any CSV, parquet, etc and Plexe uses what it needs from
your dataset to figure out what features it should use. In the
open-source tool, it only supports adding files right now but in
our platform version, we'll have support for integrating with
Postgres where it pulls all available data based on an SQL query
and dumps it into a parquet file for the agent to build models.
Next up, we'll be tackling more of the ML project lifecycle: we're
currently working on adding a "feature engineering agent" that
focuses on the complex data transformations that are often required
for data to be ready for model training. If you're interested,
check Plexe out and let us know your thoughts!
Author : vaibhavdubey97
Score : 81 points
Date : 2025-05-06 15:38 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| thefourthchime wrote:
| This is a really interesting idea! I'll be honest, it took me a
| minute to really get what it was doing. The GitHub page video
| doesn't play with any audio, so it's not clear what's happening.
|
| Once I watched the video, I think I have a better understanding.
| One thing I would like to see is more of a breakdown of how this
| solves a problem that just a big model itself wouldn't.
| vaibhavdubey97 wrote:
| Thank you!
|
| Yeah we rushed to create a "Plexe in action" video for our
| Readme. We'll put a link to the YouTube video on the Readme so
| it's easier.
|
| Using large generative models enables fast prototyping, but
| runs into several issues: generic LLMs have high latency and
| cost, and fine-tuning/distilling doesn't address the
| fundamental size issue. Given these pain points, we realized
| the solution isn't bigger generic models (fine-tuned or not),
| but rather automating the creation, deployment, and management
| of lightweight models built on domain-specific data. An LLM can
| detect if an email is malicious, but a classifier built
| specifically for detecting malicious emails is orders of
| magnitude smaller and more efficient. Plus, it's easier to
| retrain with more data.
| ratatoskrt wrote:
| In my experience, humans are really bad at statistics and LLMs
| are even worse because they basically just mimic all the typical
| mistakes people make.
| vaibhavdubey97 wrote:
| You're right. We've seen the "garbage in, garbage out" problem
| firsthand.
|
| We've seen the models hit typical statistical pitfalls like
| overfitting and data leakage during testing. We've improved by
| implementing strict validation protocols and guardrails around
| data handling. While we've fixed the agents getting stuck in
| recursive debugging loops, statistical validity remains an
| ongoing challenge. We're actively working on better detection
| of these issues, but ultimately, we rely on domain expertise
| from users for evaluating model performance.
| Oras wrote:
| I like the idea of trying multiple solutions.
|
| Does it decide based on data if it should make its own ML model
| or fine-tune a relevant one?
|
| Also, does it detect issues with the training data? When I was
| doing NLP ML models before LLMs, the tasks that took all my time
| were related to data cleaning, not the training or choosing the
| right approach.
| impresburger wrote:
| Currently it decides whether to make its own model or fine-tune
| a relevant one based primarily on the problem description. The
| agent's ability to analyse the data when making decisions is
| pretty limited right now, and something we're currently working
| on (i.e. let the agent look at the data whenever relevant,
| etc).
|
| I guess that kind of answers your second question, too: it does
| not currently detect issues with the training data. But it will
| after the next few pull requests we have lined up!
|
| And yes, completely agree about data cleaning vs. model
| building. We started from model building as that's the "easier"
| problem, but our aim is to add more agents to the system to
| also handle reviewing the data, reasoning about it, creating
| feature engineering jobs, etc.
| revskill wrote:
| Instead of "Attention is all we need", i expect an "Intention is
| all we need".
| vaibhavdubey97 wrote:
| Absolutely! And hopefully an input/output schema for the model
| :)
| vessenes wrote:
| I like this a lot, thank you for building it.
|
| Any review of smolagent? This combination of agents approach
| seems likely to be really useful in a lot of places, and I'm
| wondering if you liked it, loved it, hated it, ...
| vaibhavdubey97 wrote:
| Thank you!
|
| Smolagents works great for us but we did run into some
| limitations. For example, it lacks structured output
| enforcement, parallel execution, and in-built shared memory,
| which are crucial features for orchestrating a multi-layer
| agent hierarchy beyond simple chatbots. We've also been playing
| around with Pydantic AI due to its benefits with validation and
| type enforcement but haven't shifted yet.
| impresburger wrote:
| Hey, I'm one of the authors of Plexe. Overall, I'd say we like
| smolagents: it's simple, easy to understand, and you can get a
| project set up very quickly. It also has some neat features,
| such as the "step callbacks" (functions that are executed after
| every step the agent takes).
|
| However, the library does feel somewhat immature, and has some
| drawbacks that hinder building a production application. Some
| of the issues we've ran into include:
|
| 1. It's not easy to customise the agents' system prompts. You
| have to "patch" the smolagents library's YAML templates in a
| hacky way. 2. There is no "shared memory" abstraction out of
| the box to help you manage communication between agents. We had
| to implement an "ObjectRegistry" class into which the agents
| can register objects, so that another agent can retrieve the
| object just by knowing the object's key string. As we scale, we
| will need to build more complex communication abstractions
| (tasks queues etc). Given that communication is a key element
| of multi-agent systems, I would have expected a popular library
| like smolagents to have some kind of built-in support for it.
| 3. No "structured response" where you can pass a Pydantic
| BaseModel (or similar) to specify what structure the agent
| response should have. 4. "Managed agents" are always executed
| synchronously. If you have a hierarchy of managed agents, only
| one agent will ever be working at any given time. So we'll have
| to build an async execution mechanism ourselves.
|
| I think we've run into some other limitations as well, but
| these are the first that come to my mind :) hope this helps!
| vessenes wrote:
| Thanks - super helpful. Passing state around to agents feels
| like a big pain point right now. That said just getting
| simple state transition libraries working with agents is a
| bit of a pain point as well.
|
| Feels like there might be a good infra company in there for
| someone to build.
| fzysingularity wrote:
| Is there a benchmark or eval for why this might be a better
| approach than actually modeling the problem? If you're selling
| this a non-ML person, I get the draw. But you'd still have to
| show why using these LLMs would be better than training it with
| something simpler / more lightweight.
|
| That said, it's likely that you'll get good zero-shot
| performance, so the model building phase could benefit from fine-
| tuning the prompt given the dataset - instead of training the
| underlying model itself.
| impresburger wrote:
| Just to clarify, we're not directly using the LLMs as the
| "predictor" models for the task. We're making the LLMs do the
| modeling work for you.
|
| For example, take the classic "house price prediction" problem.
| We don't use an LLM to make the predictions, we use LLMs to
| model the problem and write code that trains an ML models to
| predict house prices. This would most likely end up being an
| xgboost regressor or something like that.
|
| As to your point about evals, great question! We've done some
| testing but haven't yet carried out a systematic eval. We
| intend to run this on OpenAI's MLE-Bench to quantify how well
| it actually does as creating models.
|
| Hope I didn't misunderstand your comment!
| dweinus wrote:
| I don't want to hate, what you built is really cool and should
| save time in a data scientist's workflow, but... we did this. It
| won't "automate most of the ML lifecycle." Back in ~2018 "autoML"
| was all the rage. It failed because creating boilerplate and
| training models are not the hard parts of ML. The hard parts are
| evaluating data quality, seeking out new data, designing
| features, making appropriate choices to prevent leakage,
| designing evaluation appropriate to the business problem, and
| knowing how this will all interact with the model design choices.
| impresburger wrote:
| Hey, one of the authors here! I completely agree with your
| comment. Training ML models on a clean dataset is the "easy"
| and fun part of an ML engineer's job.
|
| While we do think our approach might have some advantages
| compared to "2018-style" AutoML (more flexibility, easier to
| use, potentially more intelligence solution space exploration),
| we know it suffers from the issue you highlighted. For the time
| being, this is aimed primarily at engineers who don't have ML
| expertise: someone who understands the business context, knows
| how to build data processing pipelines and web services, but
| might not know how to build the models.
|
| Our next focus area is trying to apply the same agentic
| approach to the "data exploration" and "feature ETL
| engineering" part of the ML project lifecycle. Think a "data
| analyst agent" or "data engineering agent", with the ability to
| run and deploy feature processing jobs. I know it's a grand
| vision, and it won't happen overnight, but it's what we'd like
| to accomplish!
|
| Would love to hear your thoughts :)
| janalsncm wrote:
| Just a thought, but maybe a good angle would be to interview
| data analysts and ask them what the most annoying parts of
| their jobs are, to figure out how to automate the drudge
| work. If you can make their lives easier, they'll sell the
| product for you.
| vaibhavdubey97 wrote:
| Absolutely! When we started building this out, we knew that
| we had to build an agent to perform data cleaning and
| feature transformations. After speaking to data analysts,
| PMs and engineers over the last few weeks, we've received
| strong feedback about adding this capability to Plexe and
| we're actively working on it. We've already added some
| features related to this and hopefully will roll out the
| whole agent very soon!
| janalsncm wrote:
| Yes, this is the issue. In any reasonably-sized enterprise
| you're not going to have a clean CSV to plug in to a model
| generator. You're either going to have 1) 50 different excel
| spreadsheets to wrangle and combine somehow or 2) 50+ terabytes
| of messy logs to process.
|
| Creating something that can grok MNIST is certainly cool, but
| it's kind of the equivalent of saying leetcode is equivalent to
| software engineering.
|
| Second, and more practically speaking, you are automating (what
| I think of as) the most fun part of ML: the creativity of
| framing a problem and designing a model to solve that problem.
| yu3zhou4 wrote:
| Nice execution! I built a simpler version of it a year ago
| https://github.com/jmaczan/csv-to-ml I hope you succeed with the
| product and push the automl forward
| impresburger wrote:
| Hey, this is super cool! We found a few projects working on
| similar things to Plexe, but were not aware of yours. Thanks
| for sharing, will check it out!
| vaibhavdubey97 wrote:
| Very cool, thanks for sharing! :)
| Stiopa wrote:
| Awesome work.
|
| Only watched demo, but judging from the fact there are several
| agent-decided steps in the whole model generation process, I
| think it'd be useful for Plexe to ask the user in-between if
| they're happy with the plan for the next steps, so it's more
| interactive and not just a single, large one-shot.
|
| E.g. telling the user what features the model plans to use, and
| the user being able to request any changes before that step is
| executed.
|
| Also wanted to ask how you plan to scale to more advanced (case-
| specific) models? I see this as a quick and easy way to get the
| more trivial models working especially for less ML-experienced
| people, but am curious what would change for more complicated
| models or demanding users?
| impresburger wrote:
| Agree. We've designed a mechanism to enable any of the agents
| to ask for input from the user, but we haven't implemented it
| yet. Especially for more complex use cases, or use cases where
| the datasets are large and training runs are long, being able
| to interrupt (or guide) the agents' work would really help
| avoid "wasted" one-shot runs.
|
| Regarding more complicated models and demanding users, I think
| we'd need:
|
| 1. More visibility into the training runs; log more metrics to
| MLFlow, visualise the state of the multi-agent system so the
| user knows "who is doing what", etc. 2. Give the user more
| control over the process, both before the building starts and
| during. Let the user override decisions made by the agents.
| This will require the mechanism I mentioned for letting both
| the user and the agents send each other messages during the
| build process. 3. Run model experiments in parallel. Currently
| the whole thing is "single thread", but with better parallelism
| (and potentially launching the training jobs on a separate Ray
| cluster, which we've started working on) you could throw more
| compute at the problem.
|
| I'm sure there are many more things that would help here, but
| these are the first that come to mind off the top of my head.
|
| What are your thoughts? Anything in particular that you think a
| demanding user would want/need?
| drlobster wrote:
| That's great. Is there anyway to make it part of a scikit-learn
| compatible pipeline.?
| impresburger wrote:
| Do you mean being able to wrap the created model in a scikit-
| learn Pipeline? This isn't something we've thought about and we
| haven't explicitly built support for it, though we could.
|
| As of now, I think you could relatively easily wrap the plexe
| model, which has a `predict()` method, in a scikit-learn
| Estimator. You could then plug it into a Pipeline.
|
| What do you have in mind? How would you want to use this with
| scikit-learn pipelines?
| drlobster wrote:
| I think what I'm after is being able to put these in
| pipeline.
|
| I.e. if I already have some data cleaning/normalisation, some
| dimensional reduction and then some fitting, being able to
| drop the Agent in place with an appropriate description and
| task.
|
| Cleaning: Feed it a data frame and have it figure out what
| needs imputing etc.
|
| The rest: Could either be separate tasks or one big task for
| the Agent..
| impresburger wrote:
| Interesting! We don't currently support this explicitly.
|
| You could wrap the Plexe-built model in a scikit-learn
| Estimator like I mentioned, and you can specify the desired
| input/output schema of the model when you start building
| it, so it will fit into your Pipeline.
|
| This is an interesting requirement for us to think about
| though. Maybe we'll build proper support for the "I want to
| use this in a Pipeline" use case :)
| srameshc wrote:
| I am just trying to understand and an honest question: Are we
| getting a fine tuned model from the dataset ?
| vaibhavdubey97 wrote:
| Plexe analyzes your data and task description, then builds
| custom ML models using standard Python libraries (like scikit-
| learn, XGBoost, etc.). If your problem is best solved by a
| regression model, it will build that. If classification is more
| appropriate, it will implement that instead.
|
| Fine-tuning existing language models is also an option in
| Plexe's toolkit. For example, when we needed to classify prompt
| injections for LLMs, Plexe determined fine-tuning RoBERTa was
| the best approach. But for most structured data problems (like
| forecasting or recommendations), Plexe typically builds
| lightweight models from scratch that are trained directly on
| your dataset.
| throwaway314155 wrote:
| So just to be clear, you aren't building _deep_ learning
| models, or even NN-based models automatically?
| vaibhavdubey97 wrote:
| Sorry I think I explained poorly. Plexe does build deep
| learning models automatically. When it gets a dataset and a
| problem description, it automatically evaluates various
| model architectures (NNs being one of them).
|
| Plexe experiments with multiple approaches - from
| traditional algorithms like gradient boosting to deep
| neural networks. It runs the training jobs and compares
| performance metrics across different architectures to
| identify which solution best fits your specific data and
| problem constraints.
| throwaway314155 wrote:
| Oh okay! In that case, my faith is restored. Sounds like
| a cool project.
| vaibhavdubey97 wrote:
| _phew_ that was close. I 'm glad your faith is restored
| :)
| impresburger wrote:
| No, not by default. In fact, the default installation of
| plexe doesn't include deep learning libraries.
|
| Plexe _can_ build deep learning models using `torch` and
| `transformers`, and often the experimentation process will
| include some NN-based solutions as well, but that's just
| one of the ML frameworks available to the agent. It can
| also build models using xgboost, scikit-learn, and several
| others.
|
| You can also explicitly tell Plexe not to use neural nets,
| if that's a requirement.
| throwaway314155 wrote:
| Indeed your colleague explained similarly. Seems like a
| great project.
| MarcoDewey wrote:
| I love that you all are doing real old school machine learning
| and not just LLM transformer based work!
| vaibhavdubey97 wrote:
| Thank you! Hopefully more complicated transformers are coming
| soon too :)
___________________________________________________________________
(page generated 2025-05-06 23:00 UTC)