[HN Gopher] AutoML Solutions: What I Like and Don't Like About A...
       ___________________________________________________________________
        
       AutoML Solutions: What I Like and Don't Like About AutoML as a Data
       Scientist
        
       Author : aburla
       Score  : 60 points
       Date   : 2022-09-19 12:47 UTC (10 hours ago)
        
 (HTM) web link (alexandruburlacu.github.io)
 (TXT) w3m dump (alexandruburlacu.github.io)
        
       | rmnclmnt wrote:
       | > So, you're working on a new ML project. The first thing you do,
       | model-wise - you implement a simple heuristic baseline and see
       | where you stand. Second, you try a simple ML solution and analyze
       | how much it improves the baseline. One thing you can try to do
       | after this stage, at least what I like to do, is to try to
       | estimate what would be your upper bound in terms of predictive
       | performance, and let an AutoML solution squeeze the most out of
       | your data and preprocessing.
       | 
       | 100% agreed! And yet, most data scientists completely jump over
       | the first two steps described here because it is not "sexy
       | enough" and not "valuable"
        
       | bradhilton wrote:
       | I've used AutoKeras (one of the frameworks he mentions in the
       | article) and as a machine learning amateur it was super helpful.
       | It still has taken a lot of work to optimize for my use case, but
       | it was nice to get decent results nearly "out-of-the-box."
        
       | radarsat1 wrote:
       | What about data? Does using AutoML increase the amount of data
       | needed, due to needing more complex cross-validation setups, or
       | can you get away with roughly the same split?
       | 
       | I ask because I find that where the most domain knowledge is
       | needed is when less labeled data is available, and that's also
       | where I would assume AutoML doesn't perform as well.
        
         | teruakohatu wrote:
         | Some, if not all, AutoML solutions will require a minimum
         | amount of data to try certain methods.
         | 
         | But you are entirely correct. The last commercial use of AutoML
         | I saw demoed had very limited data and while the metrics were
         | ok, I could have made just as good predictions using linear
         | regression in Excel or even just a calculator and some basic
         | heuristics.
         | 
         | That's not to say AutoML is bad, I have used H2O with success,
         | it only replaces a small part of the Data Science pipeline.
        
       | mountainriver wrote:
       | > the correct way to think of AutoML is as an enabler that lets
       | you focus more on the data side of things
       | 
       | Yes! This is exactly how we are starting to use it and it makes a
       | ton of sense.
       | 
       | It also allows for very rapid POCs to just flesh out a problem
       | space, then a data scientist can come later and refine.
        
       ___________________________________________________________________
       (page generated 2022-09-19 23:01 UTC)