[HN Gopher] Why do tree-based models still outperform deep learn...
___________________________________________________________________
Why do tree-based models still outperform deep learning on tabular
data? (2022)
Author : tosh
Score : 174 points
Date : 2024-03-05 10:44 UTC (12 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| frgtpsswrdlame wrote:
| I still don't get the impetus or desire to make NNs work better
| for tabular data. Regression works pretty well and is easy to
| interpret/diagnose/work with. GBMs work really well (given a few
| considerations) and is trickier to work with but nothing crazy.
| When I see all the fancy hijinks people get up to when applying
| NNs to audio/text/pictures I think it's really cool but also not
| something I'd want to have to do if I didn't absolutely need to
| when working with data out of a relational db. And anyways, how
| much of a benefit could it actually bring? GBMs are already
| capable of fitting and dramatically overfitting most datasets.
| mrfox321 wrote:
| When you need the best possible model, full stop.
|
| E.g. finance
|
| In a sufficiently competitive space, good enough doesn't cut
| it.
| dchftcs wrote:
| Do you know of any shop that is running deep learning
| profitably?
| mjhay wrote:
| Plenty of places use DL models, even if it's just a
| component of their stack. I would guess that that gradient-
| boosted trees are more common in applications, though.
| hackerlight wrote:
| Do you know what kind of strategies it's seeing use in?
| foobar20k wrote:
| Real-time parsing of incoming news events and live
| scanning of internet news sites - coupled with sentiment
| analysis. Latency is an interesting challenge in that
| space.
| mjhay wrote:
| Still mostly NLP and image stuff. Most actual data in the
| wild is tabular - which GBTs are usually some combination
| of better and easier. In some circumstances, NN can still
| work well in tabular problems with the right feature
| engineering or model stacking.
|
| They are also more attractive for streaming data. Tree-
| based models can't learn incrementally. They have to be
| retrained from scratch each time.
| dist-epoch wrote:
| ML is very good at figuring out stuff like every day at
| 22:00 this asset goes up if this another asset is not at
| a daily maximum and the volatility of the market is low.
|
| You might call this overfitting/noise/.... but if you do
| it carefully it's profitable.
| TimPC wrote:
| Multiple parts of the iPhone stack run DL models locally on
| your phone. They even added hardware acceleration to the
| camera because most of the picture quality upgrades is
| software rather than hardware.
| Thrymr wrote:
| There is no such thing as "best possible model, full stop".
| Models are always context dependent, have implicit or
| explicit assumptions about what is signal and what is noise,
| have different performance characteristics in training or
| execution. Choosing the "best" model for your task is a form
| of hyperparameter optimization in itself.
| naijaboiler wrote:
| I can't upvote this enough. Whether in life, or with
| models, some people really do believe in the myth of
| absolutely meritocracy
| asdff wrote:
| These models usually have poorer fit though
| dawnofdusk wrote:
| The paper offers a reason why NNs working for tabular data
| would be good:
|
| >Creating tabular-specific deep learning architectures is a
| very active area of research (see section 2) given that tree-
| based models are not differentiable, and thus cannot be easily
| composed and jointly trained with other deep learning blocks.
|
| Here is a second reason, from the paper
|
| >Impressed by the superiority of tree-based models on tabular
| data, we strive to understand which inductive biases make them
| well-suited for these data.
|
| which is a great reason, because understanding the inductive
| biases of different learning/regression techniques gets us
| closer to a more general understanding of how to encode
| inductive biases in a generic learning algorithm.
| hackerlight wrote:
| My hypothesis is decision trees are more robust to
| nonstationary distributions. If the variance and means of the
| features shift dramatically, the model isn't going to blow
| up, because it's not additive.
|
| In the domains where NNs work well (image processing and
| language), you're dealing with a predictable and stable
| distribution of values. Elephants might look a bit different
| in the train and test set, but you're not randomly getting
| 100x the variance of the input data. The decision tree just
| isn't going to care as much, because splits around the mean
| will lead to the same outcome.
|
| Another hypothesis is that zooming into bivariable
| relationships is more important in tabular data. Neural nets
| are better at local and global context. But they struggle if
| all that matters is the relationship between two columns of
| data because of the additive nature. Large networks _can_
| figure it out due to model capacity, but then you 'll run
| into overfitting.
| hansvm wrote:
| In case anyone's sufficiently motivated (no promises, but I
| might test it out eventually), a couple deep architectures
| that might address those concerns are:
|
| 1. Something like a deep support vector machine. Instead of
| (linear) -> (any activation), you want to create a bunch of
| features that look like testing the vector against a
| splitting hyperplane. One option is (bias) -> (matmul) ->
| (1-bit sigmoid). Applying a bias term _for each row_ let's
| you choose the branch location, the matmul's result will be
| positive or negative at each output feature depending on
| which side of the hyperplane normal to the vector described
| by the corresponding row you happen to fall on. Then just
| bring that down to -1 or 1 so you can't sneak much
| nonstationary drift variance into the output (perhaps train
| with a normal sigmoid annealed to behave more like this
| one, and a suitable regularizing term to keep the network
| from sneaking in values near 0 to thwart your annealing).
|
| 2. Use an attention-like mechanism, but across features
| (this would likely require an additional tensor channel, so
| that each "feature" carries information in a high enough
| dimensional space for this to do something meaningful). You
| apply the inductive bias that sparse feature interactions
| are important and need to be discovered.
|
| Those two ideas also compose easily.
| hackerlight wrote:
| > this would likely require an additional tensor channel,
| so that each "feature" carries information in a high
| enough dimensional space
|
| Suppose input data is [batch_size, num_features]. Then
| you do x.unsqueeze(1) giving you [batch_size,
| num_features, 1]. Then what?
| hansvm wrote:
| You probably want something equivalent to (however you
| make it fast in your chosen framework):
|
| einsum('bf,fc->bfc', batched_inputs, channel_embedding)
|
| Then carry that info through the network and project it
| down at the end. It's roughly equivalent to the token
| embedding step in an LLM.
| melondonkey wrote:
| At this point I wish every junior DS could read this paper and
| not come in to every problem with the new bright idea that
| they're going to beat XGBoost with their DL architecture. Free
| promotion if they never say the words "latent subspace"
| barrenko wrote:
| One of those juniors is going to do it once!
| bbstats wrote:
| because smooth is better than jagged :)
| mikkom wrote:
| It's very important to note that this is from 2022. I'm not
| saying it's not true today but neural models have gotten much
| better in 2 years.
|
| (I'm personally using NN models for predicting certain values for
| tabularly structured data and at least for my case, the NN works
| better than state-of-the art tree models.)
| mjhay wrote:
| Do you have any intuition you could share of why NNs work
| better in this case?
| queuebert wrote:
| Not the parent, but NNs typically work better when you can't
| linearize your data. For classification, that means a space
| in which hyperplanes separate classes, and for regression a
| space in which a linear approximation is good.
|
| For example, take the circle dataset here:
| https://playground.tensorflow.org
|
| That doesn't look immediately linearly separable, but since
| it is 2D we have the insight that parameterizing by radius
| would do the trick. Now try doing that in 1000 dimensions.
| Sometimes you can, sometimes you can't or don't want to
| bother.
| mjhay wrote:
| That's an advantage over linear models, but GBTs handle non
| linearly-separated data just fine. Each individual tree can
| represent an arbitrary piecewise-constant function given
| enough depth, and then each tree in turn tries to minimize
| the loss on the residual of the previous trees. As such,
| they're effectively like a neural network with two hidden
| layers in terms of expressiveness.
| melondonkey wrote:
| This explanation doesn't make sense to me. What do you mean
| by "linearize your data"--tree methods assume no linear
| form and are not even monotonically constrained.
| Classification is not done by plane-drawing but by
| probability estimation + cost function
| dist-epoch wrote:
| A tree split can be considered plane-drawing.
| CuriouslyC wrote:
| Note that if linear separability is the only issue you can
| just use kernel methods. In fact, gaussian processes are
| equivalent to a single hidden layer neural network with
| infinite hidden values.
|
| The magic of deep neural networks comes from modeling
| complicated conditional probability distributions, which
| lets you do generative magic but isn't going to give you
| significantly better results than ensemble kNN when you're
| discriminating and the conditional distribution is low
| variance. Ensemble methods are like a form of
| regularization and they also act as a weak bootstrap to
| better model population variance, so it's no surprise that
| when they're capable of modeling the domain, they perform
| better than unregularized, un-bootstrapped neural network
| model. There are still tons of situations where ensemble
| methods can't model the domain, and if you incorporated
| regularization and bootstrapping into a discriminative NN
| model it would probably perform equivalently to the
| ensemble model.
| mikkom wrote:
| I assume it's because there are some very complex
| relationships and patterns that cannot be captured by
| decision trees. Tree models work better on simpler data at
| least that is my gut feeling based on previous experiments
| with similar data.
| mjhay wrote:
| Interesting. Usually I have better luck with xgboost for
| tabular data, even when the relationships are complex
| (which usually means deeper trees). It does fall flat a lot
| of the time for very high dimensions, though. All data is
| different, I guess.
| lerchmo wrote:
| There is some work with zero shot (decoder only) time series
| predictions by google and an open source variant. Curious to
| see how these approaches stack up as they are explored.
| redox99 wrote:
| In what way have models gotten better for tabular data? Can't
| think of any new technique since 2022.
| jeffreyrogers wrote:
| There has been some work on training on lots of different
| data sets and then specializing on the one you care about.
| But I think people were trying that approach pre-2022 as
| well.
| frituur wrote:
| Do you have some good scientific references for that? I'd
| love to incorporate them in my phd thesis!
| jeffreyrogers wrote:
| Sorry, I don't have references off the top of my head. I
| just recall coming across it while I was working on
| something related to timeseries forecasting.
| asdff wrote:
| This has to be done with great care. Most datasets are of
| poor quality.
| _pastel wrote:
| Tooling around embeddings has improved. Creating and fine-
| tuning custom embeddings for your tabular data should be
| easier and more powerful these days.
| __mharrison__ wrote:
| Pretty sure it is still true today. Catboost rules the roost!
| dawnofdusk wrote:
| Paper seems interesting but I don't like the question title. I
| think the answer to the question would just be that tabular data
| is not fully in the "big data" regime yet so there is no reason a
| priori to expect deep NNs to do better. Factor in computational
| simplicity of tree-based models and I think the deck is stacked
| against deep learning from the start.
| math_dandy wrote:
| Do you know of _any_ (families of) examples of tabular datasets
| of any size (you can choose what "big" means) where deep
| learning convincingly outperforms traditional methods? I would
| love some quality examples of this nature to use in my
| teaching.
| Scene_Cast2 wrote:
| Recommendation engines: search, feeds (tiktok / youtube
| shorts / etc), ads, netflix suggestions, doordash
| suggestions, etc etc. Also happens to be my specialty.
| usgroup wrote:
| I'm not sure that is true. I think inference speed is often
| the bottleneck for the use cases stated, as is the need for
| frequent re-training. As a result algorithms like catboost
| are very popular in those domains. I think catboost was
| actually invented by Yandex.
|
| PS: Its weird that you are being down-voted. I think your
| opinion is reasonable.
| Scene_Cast2 wrote:
| Inference speed: more sophisticated stacks use multiple
| stages. Early stage might be a sublinear vector search,
| and the heavy hitting neural nets only rerank the
| remainder. Bytedance has a paper on their fairly fancy
| sublinear approach.
|
| Retraining - online training solves this for the most
| part.
|
| Frameworks - the only battle-tested batteries-included
| one I've seen is Vespa. Noone else publishes any of
| interesting bits. KDD is the most relevant conference if
| you're interested in the field. IIRC Xiaohongshu has some
| papers that can only really be done with NNs.
| math_dandy wrote:
| Wonderful! Any public datasets you could point me to?
| Scene_Cast2 wrote:
| Unfortunately, none that I know of. Maybe the Netflix
| movie recommendations challenge from ages ago? I haven't
| looked at it personally.
| Jensson wrote:
| I worked with search and ads model at Google, for most
| things tree models were better. What evidence do you have
| that neural nets are better there? I worked with large
| parts of Google search ranking so I know what I'm talking
| about, some parts you want a neural net but most of the
| work is done by tree models and similar, they both perform
| better and run faster.
| Scene_Cast2 wrote:
| I've worked on models trained on ultra-large tabular data. It
| still took substantial effort to beat tree models (custom
| architecture specifically for this particular domain, something
| I haven't seen elsewhere out in the open).
|
| When tabular data is mentioned, one of the unspoken
| applications is finance. There, my guess is that one of the
| issues is that data is not very IID and thus latent "events"
| are fairly sparse. Combine that with the humongous amount of
| raw data, and you get models that overfit.
| TimPC wrote:
| I think there are certain types of tabular data that lend
| themselves naturally to tree models. But when you're talking
| about tabular data for finance I guarantee you very few hedge
| funds are running tree models for trading strategies. When
| your scale of data is the past X quarters of all stock prices
| and trade volumes you have enough data that you can fit an NN
| and there are a number of techniques you can use to reduce
| overfitting (large amount of data, good regularization,
| dropout, etc.)
| Jensson wrote:
| > But when you're talking about tabular data for finance I
| guarantee you very few hedge funds are running tree models
| for trading strategies
|
| What do you base this on? Having only neural nets on
| tabular data is mostly done due to laziness of the creator
| since neural nets are much easier to use, not because
| neural nets perform better even with large amounts of data.
| In general you want both since they are good at finding
| different kinds of patterns.
| Jensson wrote:
| The tabular data I had at Google was exabytes, tree models
| still performed the best so I guess exabytes is small data
| then?
| Scene_Cast2 wrote:
| The team behind Yggdrasil tree library at Google was doing some
| interesting research into tree differentiability (and thus
| unlocking SGD & end-to-end learning for hybrid architectures).
| doubtfuluser wrote:
| Since this is from 2022, I'm wondering how "tabular foundation
| models" could change this. The incredible success of DL we see at
| the moment comes partially from foundation models learning on a
| lot of "semi-related" data an "understanding" of the behavior.
| Something similar has been explored in tabular data as well iirc.
|
| So I would be curious to see latest DL results. On the other hand
| it is also the case that in most cases where DL based on
| foundation models is used, specific heavily tuned models
| outperform the generalistic models. And for tabular data there is
| a lot of experience how to make it great with tree based models.
| dweinus wrote:
| What would these tabular foundation models look like? LLMs work
| as foundation models because the input is fixed in format (a
| sequence of text). Would the model be for a specific fixed
| tabular format?
| scottyak wrote:
| One promising approach is to encode each feature key and
| feature value as embedding vectors, concatenate them into
| "feature tokens", then feed them into a Transformer (without
| positional encodings). This takes advantage of column-order
| invariance. See:
|
| https://arxiv.org/abs/2403.01841 (ICLR 2024 spotlight)
| moelf wrote:
| There seems to be differentiable tree models now that perfor
| somewhat better than e.g. XGBoost
| https://github.com/Evovest/MLBenchmarks.jl?tab=readme-ov-fil...
| candiodari wrote:
| TLDR: Because tree-based models don't just outperform deep
| learning, they totally outclass deep learning on simple data.
|
| But they don't scale with larger and more complex data. You
| cannot (realistically) make an LLM with XGBoost.
|
| Kind of surprised how well Resnet and FT Transformer do though.
| nickpsecurity wrote:
| I want to see more work combining them. Here's an example I saw
| in one of the links in this thread:
|
| https://arxiv.org/abs/1806.06988
|
| It combines NN's with decision trees.
| huac wrote:
| the famous FB ads paper (from 10 years ago!) combines
| decision trees with a logistic regression and shows a
| significant improvement:
| https://research.facebook.com/publications/practical-
| lessons...
|
| feel free to extend logistic regression to an MLP :)
| MAXPOOL wrote:
| > deep learning architectures have been crafted to create
| inductive biases matching invariances and spatial dependencies of
| the data. Finding corresponding invariances is hard in tabular
| data, made of heterogeneous features, small sample sizes, extreme
| values
|
| Transformers with positional encoding have embeddings are
| invariant to the input order. CNN's have translation invariance
| and can have little rotational invariance.
|
| It's harder to find similar invariances to tabular data. Maybe
| applying methods from GNN's would help?
| queuebert wrote:
| This is tangential, but that paper has some amazingly good plots
| for an ML paper.
| usgroup wrote:
| I'm not sure this is surprising. Say you were to glue together 10
| datasets with the same 10 explanatory features and 1 response
| feature, but distributed very differently to each other. This
| would be no problem for tree based model because they'll
| conditionalise indefinitely to get a good fit. If the number of
| records is relatively small (say 10k) the dataset will be much
| too scarce for an NN to learn these discontinuities -- its like
| it has 1000 records per segment.
|
| Similarly, tabular data is often of this nature. Its not i.i.d,
| it tends to cluster.
| 3abiton wrote:
| I wonder if that would be the case for graph based models too
| skadamat wrote:
| When working with tabular data, there are very few situations
| where absolute model performance is the only criteria that's
| important. In practice, the following are equally as important:
|
| - Explainability / debug-ability of models
|
| - Effort to train, deploy, and manage NN models in production
|
| - Capturing, collating, and organizing new & better datasets
|
| - Local developer experience and human-model-iteration time
|
| Building all of your software in C or Assembly will be faster and
| higher performant. But at what cost and with what tradeoffs?
| Building a website has a different set of tradeoffs than building
| a program for the Mars rover.
| derefr wrote:
| It's funny; as a regular non-ML programmer, the optimum for
| every one of those factors for "tabular data" would seem to
| _me_ , to be to "throw the tabular data into a relational data
| warehouse, and ask your questions in the form of SQL queries."
|
| Or, if the "tabular data" is heavily relationship-based, then
| possibly replace "relational data warehouse" with "graph
| database", and "SQL queries" with whatever querying language
| that graph DB is natively / most expressively queried in.
|
| Of course, this is the most important implicit "equally
| important" factor, one that an ML dev would think goes without
| mentioning: the generality or "power" of the model in what
| questions it can answer. You can only make these trade-offs in
| the context of knowing what kinds of questions you want your
| model to solve for! If all your questions are quantitative
| ones, maybe the right "model" for you is an RDBMS!
|
| ---
|
| Though, that being said... why can't a deep-learning model
| _emulate_ the thing that an RDBMS does, "at runtime", as part
| of its "mental toolkit" for approaching problems? That would be
| the best of both worlds, no?
|
| I know that LLMs in particular have been observed to have
| "emergent numeracy" above a certain training-set size. There is
| a step function in how they approach such problems, going from
| their only being able to answer arithmetic questions on numbers
| of bounded size, and sometimes getting the answers wrong
| (probably this is due to a memorization-based approach); to
| being able to answer arbitrary arithmetic questions on operands
| of unbounded size, and always getting the answer correct.
|
| I would _guess_ that that what 's happening, is that they are
| developing a functional component of their network that works
| akin to an Arithmetic Logic Unit, operating not on tokens, but
| on tokens _transformed_ into a "numeric register"
| representation that is amenable to having math done to it with
| stable, quantized, position-independent results. (Just like the
| functional component that human brains develop after seeing
| enough math problems... probably.)
|
| Do you, as an ML dev, think it would ever be possible for any
| of the model architectures we're familiar with today, to be
| trained such that they would develop an analogous emergent
| functional component for handling tabular-data questions, by
| _transforming its internal working state into relational-DB
| /graph-DB data structures_ -- e.g. page-heaps of binary-packed
| row-tuples; B-tree indices; etc -- and then manipulating the
| working state in that form, using learned algorithms applicable
| to that type of data?
|
| It seems to me (possibly just because I don't know any better)
| that just as with numeracy, "being able to put the data into a
| different and better internal representation" is what would be
| needed for deep-learning models to become truly _good_ at
| dealing with tabular-data problems.
|
| But, unlike with numeracy, "thinking as if you were a
| relational database" is _not_ something a single human would
| ever intuit how to do without being taught. Relational algebra
| -- and the data-structures and algorithms to make it practical
| to have a Turing machine do said relational algebra -- wasn 't
| even a single intuition, but a conscious effort, of _multiple_
| humans, working together over years. I strongly doubt that
| there 's any number of "tabular-data problems" that you could
| show a human being, that would result in them developing an
| _intuitional ability_ to do what a relational database does
| with its memory to efficiently answer queries.
|
| (I suppose we could _give_ an ML model an RDBMS, and hardwire
| it to interact with it. I know there are hybrid ML + formal-
| logic systems. Are there hybrid ML + data-warehouse systems?
| Not where the model queries an external DB -- while that can be
| done, it 'd be only in the same "stop and do this" way that
| ChatGPT runs Python code, which wouldn't make it a _thinking
| tool_ the way that the formal-logic proof engines are for
| hybrid ML systems. Rather, I mean that some data-warehouse
| execution engine could be embedded into the ML execution
| framework itself, deployed as part of the GPU shader-program to
| each tensor core, such that data-warehouse operations can be
| done as a native part of the network 's per-node instruction-
| set. Anyone ever tried this?)
| dist-epoch wrote:
| > ask your questions in the form of SQL queries
|
| How do you know which questions to ask? This is what ML is
| good at, finding the right questions which classify the data.
| derefr wrote:
| You already made a faulty assumption -- that we're
| interested in "classifying the data" in the first place.
|
| Maybe we already know everything about the dataset. For
| example, if it's line-of-business customer data gradually
| built up by a sales team, then the _brains of the
| salespeople_ have likely already done all the "implicit
| classification" needed to generate good questions about the
| dataset.
|
| And this is, by far, the _usual_ scenario for Business
| Intelligence questions: someone with "business-domain
| knowledge", e.g. an executive, has formed an _intuitional
| hypothesis_ about the data based on their personal
| experience; and so they ask someone with "data-domain
| knowledge", e.g. a business analyst or data scientist, to
| test that hypothesis.
|
| It's actually rare, in my experience, to have a tabular-
| data dataset that someone is motivated to understand, that
| doesn't also "come with" a set of people who can already
| act as (good!) models trained on that dataset, to aid them
| in that understanding. (Sometimes these people can't _find_
| each-other -- but they do usually _exist_.)
|
| AFAIK, having reams of _entirely opaque and ill-understood_
| tabular data, such that you need classification /clustering
| to get started on asking questions, only really happens in
| the sciences: sensor-network climate data; longitudinal-
| study medical-outcome data; census data; housing-market
| data; etc. In other words, it's almost always _universities
| and governments_ -- not businesses -- that care about
| analyzing opaque tabular data.
|
| And that's a key to understanding the constraints in play
| for choosing models! Because business-driven analyses are
| usually time-constrained in some way (potentially even
| needing post-training question-answers to be generated in
| soft-realtime); while institutional analyses usually
| aren't. Big difference!
| letsdothisagain wrote:
| I'm really not clear on why you're arguing against this.
| A proper data warehouse tackles the known unknowns, i.e.
| supervised learning. But you can glean new insights using
| unsupervised learning, like the textbook example of
| Target knowing a woman is pregnant based on sales data.
|
| https://www.forbes.com/sites/kashmirhill/2012/02/16/how-
| targ...
| lemmsjid wrote:
| I might be misunderstanding your point, but there's use
| cases that have repeatedly come up for me in multiple
| businesses, below being some examples, without getting
| too specific:
|
| - identify latent features of customers via their
| behavioral data, to be used for profiling customers or
| recommending products to them
|
| - within a large amount of customer behavioral data,
| identify potentially fraudulent behavior
|
| - identify causes of seasonality (e.g. temporal patterns)
| in the data in order to improve forecasting (sales,
| traffic, whatever)
|
| In those cases part of the investigation is to initially
| take a hands-off (unsupervised) approach, so that we can
| compare our initial top-down hypotheses with actual
| patterns in the data.
|
| In both of those cases there's considerable (and
| sometimes adversarial) noise in the data.
| itsoktocry wrote:
| > _You already made a faulty assumption -- that we 're
| interested in "classifying the data" in the first place._
|
| It's not clear what your point is. If you're not
| interested in the predictions that tree-based models
| provide, do not use tree-based models on your tabular
| data. A predictive model and a SQL query are not the same
| thing.
| jonathankoren wrote:
| What? No! That's not how it works. That's not how anything
| -- _including unsupervised techniques_ work!
| jonathankoren wrote:
| > It's funny; as a regular non-ML programmer, the optimum for
| every one of those factors for "tabular data" would seem to
| me, to be to "throw the tabular data into a relational data
| warehouse, and ask your questions in the form of SQL
| queries."
|
| It's doubly funny; as someone that comes from an ML
| background, and has developed and maintained multiple ML
| systems at multiple orgs, that I also think the answer very
| often is, "throw the tabular data into a relational data
| warehouse, and ask your questions in the form of SQL
| queries."
| dartos wrote:
| Most problems don't need complex solutions.
| closeparen wrote:
| >"throw the tabular data into a relational data warehouse,
| and ask your questions in the form of SQL queries."
|
| You can ask SQL descriptive questions. Can you ask it for
| predictions? How?
| asdff wrote:
| This is called extrapolation and can be done with simple
| linear regression in some cases
| itsoktocry wrote:
| > _simple linear regression in some cases_
|
| You're correct, but "in some cases" is doing a lot of
| work here.
|
| With the tooling where it's at, how much harder is it to
| apply xGBoost vs a linear model?
| taway_6PplYu5 wrote:
| https://www.red-gate.com/simple-talk/blogs/statistics-sql-
| si...
|
| One of several examples of implementing linear regression
| in SQL.
| martindbp wrote:
| Deep learning really shines when the input is raw and at a very
| low abstraction level: pixels, byte pair encodings etc. Using
| deeply learning for classification on tabular data is just
| needless complexity, as the variables are often at a very high
| abstraction level already. Also with tabular data there are
| generally not many spatial or temporal relationship between the
| variables, which CNNs and transformers excel at.
| jobigoud wrote:
| We have things that can describe and explain why an image they
| have never seen is funny. That's pretty high level.
| martindbp wrote:
| Yes, what I meant was deep learning is great at deriving
| those higher level abstractions from low level raw data.
| Words can be seen as something in between, bag of words can
| be fairly effective at simpler tasks, but LLMs embed words
| into higher and higher abstractions.
| andy99 wrote:
| Also images and text have tons of recurring patterns that can
| be exploited to train big models with lots of data. There is an
| internet worth of each modality that at least generally can all
| contribute helping a model build up a better overall
| understanding.
|
| There is no analog for tabular data, it's all different.
| sgt101 wrote:
| Isn't this just that trees are a natural compression of tables?
|
| smthin smthin inductive bias?
| hashta wrote:
| I have a lot of experience working with both families of models.
| If you use an ensemble of 10 NNs, they outperform well-optimized
| tree-based models such as XGBoost & RFs.
| padthai wrote:
| Which kind of ensemble? Because it cannot be as easy as a
| voting meta model of nn with same architecture/hyperparametres
| right?
___________________________________________________________________
(page generated 2024-03-05 23:00 UTC)