[HN Gopher] AutoML Solutions: What I Like and Don't Like About A...
___________________________________________________________________
AutoML Solutions: What I Like and Don't Like About AutoML as a Data
Scientist
Author : aburla
Score : 60 points
Date : 2022-09-19 12:47 UTC (10 hours ago)
(HTM) web link (alexandruburlacu.github.io)
(TXT) w3m dump (alexandruburlacu.github.io)
| rmnclmnt wrote:
| > So, you're working on a new ML project. The first thing you do,
| model-wise - you implement a simple heuristic baseline and see
| where you stand. Second, you try a simple ML solution and analyze
| how much it improves the baseline. One thing you can try to do
| after this stage, at least what I like to do, is to try to
| estimate what would be your upper bound in terms of predictive
| performance, and let an AutoML solution squeeze the most out of
| your data and preprocessing.
|
| 100% agreed! And yet, most data scientists completely jump over
| the first two steps described here because it is not "sexy
| enough" and not "valuable"
| bradhilton wrote:
| I've used AutoKeras (one of the frameworks he mentions in the
| article) and as a machine learning amateur it was super helpful.
| It still has taken a lot of work to optimize for my use case, but
| it was nice to get decent results nearly "out-of-the-box."
| radarsat1 wrote:
| What about data? Does using AutoML increase the amount of data
| needed, due to needing more complex cross-validation setups, or
| can you get away with roughly the same split?
|
| I ask because I find that where the most domain knowledge is
| needed is when less labeled data is available, and that's also
| where I would assume AutoML doesn't perform as well.
| teruakohatu wrote:
| Some, if not all, AutoML solutions will require a minimum
| amount of data to try certain methods.
|
| But you are entirely correct. The last commercial use of AutoML
| I saw demoed had very limited data and while the metrics were
| ok, I could have made just as good predictions using linear
| regression in Excel or even just a calculator and some basic
| heuristics.
|
| That's not to say AutoML is bad, I have used H2O with success,
| it only replaces a small part of the Data Science pipeline.
| mountainriver wrote:
| > the correct way to think of AutoML is as an enabler that lets
| you focus more on the data side of things
|
| Yes! This is exactly how we are starting to use it and it makes a
| ton of sense.
|
| It also allows for very rapid POCs to just flesh out a problem
| space, then a data scientist can come later and refine.
___________________________________________________________________
(page generated 2022-09-19 23:01 UTC)