Post AnokjEfQCnoxtV5tdA by jayalane@mastodon.online
 (DIR) More posts by jayalane@mastodon.online
 (DIR) Post #AnoJJipkj74hA9XIi8 by futurebird@sauropods.win
       2024-11-07T20:27:40Z
       
       0 likes, 0 repeats
       
       Kaggle makes me feel SO OLD. I have taught Mathematical Statistics many times and love the machinery and theory of probability and statistics but what in god’s name is a “data modeling contest” is this like a beauty pageant for distributions (bad news they are nearly all thick in the middle) ?? why do they keep trying to put an AI helper on me like some kind of unholy cross between clippy and a slapbot from The Culture ???
       
 (DIR) Post #AnoJW50C8AZTOMJSKm by mcsquank@mastodon.online
       2024-11-07T20:29:32Z
       
       0 likes, 0 repeats
       
       @futurebird "think in the middle" hahaha
       
 (DIR) Post #AnohY7qAwjv1fr8QC0 by guyjantic@c.im
       2024-11-08T00:59:11Z
       
       0 likes, 1 repeats
       
       @futurebird As someone wanting to leave academia and go to something (like research? data science?), I've been hit with this for a few years. The people doing parametric (or even nonparametric) old school stats are a minority in industry. Machine learning ate the industry years ago, well before LLMs. Random forests, GMMs, etc. are everywhere. Industry doesn't care about discovering truth; it cares about a model that will increase its bottom line by 1.5%.
       
 (DIR) Post #AnojHOhzB8H8yTSIJk by futurebird@sauropods.win
       2024-11-08T01:18:35Z
       
       0 likes, 1 repeats
       
       @guyjantic Until I (get around) to reading a goddamed proof of HOW exactly this stuff works I don't trust it at all. You can't use statistical methods to show your statistical methods work they aren't... real data.This is going to make us all high as a kite hallucinating whatever we want!So uh... anyone got any proofs on the soundness of the new noise?
       
 (DIR) Post #AnokIu3ODth2QwkTlw by guyjantic@c.im
       2024-11-08T01:30:03Z
       
       0 likes, 1 repeats
       
       @futurebird haha. I'm kind of with you on this. Basic ML is often defended with a sort of "the proof is in the pudding" argument, which doesn't sit well with me, but whatevs. In my opinion, basic ML is stepwise regression on steroids, almost guaranteed to capitalize on sample randomness and produce lots of misleading results. I think the standard practice is to try to rein that problem in with a few things like (importantly) always running models on a training dataset and then testing them on a separate production dataset, noting shrinkage, etc. They also do (did?) a lot of dimensional reduction stuff like PCA and various decompositions. But the theory behind it is, it seems to me (though I've not done a deep dive), pretty underdeveloped. There's a lot of shrugging and saying things like "It increased sales for a month, so it worked." Then they abandon that model and move to another one (though some get put into long-term production for better and/or worse).As for LLMs, I'm seeing a lot of statisticians and data scientists online becoming disillusioned. Their bosses all want LLMs in everything, but the LLMs don't necessarily produce any new insights from data, despite taking hundreds of time more energy to do their computations.
       
 (DIR) Post #AnokjEfQCnoxtV5tdA by jayalane@mastodon.online
       2024-11-08T01:34:49Z
       
       0 likes, 0 repeats
       
       @futurebird @guyjantic I think it is effectively all non-linear but differential transform followed by linear stats like separation by huperplane or fitting hyper plane. But very high dimension.  And. no proofs. Nor often any rigor about the 1.5% savings. They just look at a graph and see a drop correlated with the new model and assume. Or run two condition  tests maybe with old and new model and see the difference.