[HN Gopher] Why do random forests work? They are self-regularizi...
___________________________________________________________________
Why do random forests work? They are self-regularizing adaptive
smoothers
Author : sebg
Score : 113 points
Date : 2024-10-17 21:24 UTC (2 days ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| PoignardAzur wrote:
| Reading the abstract alone, I have no idea whether it's talking
| about algorithmic trees or, like, the big brown things with small
| green bits.
| vatys wrote:
| I had the exact same reaction: biology or computers?
|
| The only hint I can see anywhere on the page is "Statistics >
| Machine Learning" above the abstract title.
|
| I really want it to be about actual biological trees being
| studied on the scale of forests growing with smooth edges over
| long periods of time, but I suspect that's not what it is
| about.
| bonoboTP wrote:
| There's also "Subjects: Machine Learning (stat.ML); Machine
| Learning (cs.LG)"
|
| Also, the very first sentence of the actual paper (after the
| abstract) is
|
| > Random forests (Breiman, 2001) have emerged as one of the
| most reliable off-the-shelf supervised learning algorithms
| [...]
|
| arxiv.org is overwhelmingly used for math and computer
| science papers, though not exclusively.
|
| The paper will also likely be submitted to a machine learning
| venue.
| ibgeek wrote:
| Biological trees don't make predictions. Second or third
| sentence contains the phrase "randomized tree ensembles not
| only make predictions."
| mhuffman wrote:
| It's talking about these[0]
|
| [0]https://en.wikipedia.org/wiki/Random_forest
| defjm wrote:
| Jeremy Howard has a great video explaining Random Forests:
| https://youtu.be/blyXCk4sgEg .
| krystofee wrote:
| You have to know some machine learning fundamentals to figure
| that out - "Random Forest" is a specific machine learning
| algorithm, which does not need a further explanation. To take
| it a step further, they should really not describe "Machine
| learning", no, its not like the machine takes a book and
| learns, its a term.
| bigmadshoe wrote:
| The tree is an incredibly common data structure in computer
| science. Decision trees are well known. Random forests are
| ubiquitous in Machine Learning. Should the authors really have
| to dumb their paper down so people who don't work in this
| domain avoid confusing it with work in arborism?
| avazhi wrote:
| Pretty sure the guy you're replying to was half-joking, but
| adding the words 'machine learning' in the first sentence
| would have cleared this up pretty simply and wouldn't have
| resulted in dumbing down anything.
| youoy wrote:
| Nice article, although I find it's a bit overly complex if your
| are not familiar with ML and mathematics at the same time.
|
| I will leave here a geometrical intuition of why they work in
| case it can help someone:
|
| To simplify things, I will talk about regression, and say we want
| to predict some value y, that we know depends on x, and we have
| some noisy measurements of y.
|
| y = f(x) y_observed = f(x) + noise
|
| If you want some mental image, think about f(x)=sin(x).
|
| Now, a (over fitted) regression tree in this case is just a step
| function where the value at x is y_observed. If there is no
| noise, we now that by doing more measurements, we can approximate
| y with as much precision as we want. But if there is noise, the
| regression tree will over fit the noisy values, creating some
| artificial spikes.
|
| If you want to avoid this over fitting, you sample a lot of times
| the values of X, and for each sample you build a regression tree,
| and then average them. When you average them, every tree will
| contain its own particular artificial spikes, and if they are
| noise, they won't appear in the majority of the other trees. So
| when you average them, the spikes will attenuate, creating the
| smoother behaviour that the article talks about.
|
| I hope it helps!
| hammock wrote:
| Thanks that helps. The way I think about your example is it's
| like (not the same obv) taking a bunch of moving averages of
| different durations at different starting points, and throwing
| those into your regression model along with the actual data
| levocardia wrote:
| Good to see more research exploring the connection between trees,
| ensembles, and smoothing. Way back in Trevor Hastie's ESL book
| there's a section on how gradient boosting using "stumps" (trees
| with only one split) is equivalent to an additive spline model
| (GAM, technically) with a step function as a spline basis and
| adaptive knot placement.
|
| I've always thought there should be a deep connection between
| ReLU neural nets and regularized adaptive smoothers as well,
| since the ReLU function is itself a spline basis (a so-called
| truncated linear spline) and happens to span the same functional
| basis as B-splines of the same degree.
| tylerneylon wrote:
| Here's some context and a partial summary (youoy also has a nice
| summary) --
|
| Context:
|
| A random forest is an ML model that can be trained to predict an
| output value based on a list of input features: eg, predicting a
| house's value based on square footage, location, etc. This paper
| focuses on regression models, meaning the output value is a real
| number (or a vector thereof). Classical ML theory suggests that
| models with many learned parameters are more likely to overfit
| the training data, meaning that when you predict an output for a
| test (non-training) input, the predicted value is less likely to
| be correct because the model is not generalizing well (it does
| well on training data, but not on test data - aka, it has
| memorized, but not understood).
|
| Historically, a surprise is that random forests can have many
| parameters yet don't overfit. This paper explores the surprise.
|
| What the paper says:
|
| The perspective of the paper is to see random forests (and
| related models) as _smoothers_, which is a kind of model that
| essentially memorizes the training data and then makes
| predictions by combining training output values that are relevant
| to the prediction-time (new) input values. For example, k-nearest
| neighbors is a simple kind of smoother. A single decision tree
| counts as a smoother because each final/leaf node in the tree
| predicts a value based on combining training outputs that could
| possibly reach that node. The same can be said for forests.
|
| So the authors see a random forest as a way to use a subset of
| training data and a subset of (or set of weights on) training
| features, to provide an averaged output. While a single decision
| tree can overfit (become "spikey") because some leaf nodes can be
| based on single training examples, a forest gives a smoother
| prediction function since it is averaging across many trees, and
| often other trees won't be spikey for the same input (their leaf
| node may be based on many training points, not a single one).
|
| Finally, the authors refer to random forests as _adaptive
| smoothers_ to point out that random forests become even better at
| smoothing in locations in the input space that either have high
| variation (intuitively, that have a higher slope), or that are
| far from the training data. The word "adaptive" indicates that
| the predicted function changes behavior based on the nature of
| the data -- eg, with k-NN, an adaptive version might increase the
| value of k at some places in the input space.
|
| The way random forests act adaptively is that (a) the prediction
| function is naturally more dense (can change value more quickly)
| in _areas of high variability_ because those locations will have
| more leaf nodes, and (b) the prediction function is typically a
| combination of a wider variety of possible values _when the input
| is far from the training data_ because in that case the trees are
| likely to provide a variety of output values. These are both ways
| to avoid overfitting to training data and to generalize better to
| new inputs.
|
| Disclaimer: I did not carefully read the paper; this is my quick
| understanding.
| abetusk wrote:
| I think this is specifically coming to terms with an insight
| that's taught to statisticians about a bias-variance tradeoff.
|
| From my understanding, in a statistical setting, low
| variability in bias leads to high variability in variance
| whereas low variability in variance leads to high variability
| in bias. The example I saw was with K-means, where K = N leads
| to high variance (the predicted cluster is highly variable) but
| low bias (take an input point, you get that exact input point
| back), vs. K=1 low variance (there's only one cluster) but bad
| bias (input point is far away from the cluster
| center/representative point).
|
| I'm not sure I've characterized it well but there's a Twitter
| post from Alicia Curth that explains it [0] as well as a paper
| that goes into it [1].
|
| [0] https://x.com/AliciaCurth/status/1841817856142348529
|
| [1] https://arxiv.org/abs/2409.18842
| bonoboTP wrote:
| The same team wrote another interesting paper arguing that
| there's no "double descent" in linear regression, trees, and
| boosting, despite what many argued before (in this paper they
| don't tackle deep learning double descent, but remark that the
| case may be similar regarding the existence of different
| complexity measures being conflated).
| StableAlkyne wrote:
| Random forests are incredible cool algorithms.
|
| The idea that you can take hundreds of bad models that over fit
| (the individual decision trees), add even more randomness by
| randomly picking training data and features*, and averaging them
| together - it's frankly amazing that this leads to consistently
| OK models. Often not the best but rarely the worst. There's a
| reason they're such a common baseline to compare against.
|
| *Unless you're using Sklearn, whose implementation of
| RandomForestRegressor is not a random forest. It's actually
| bagged trees because they don't randomly select features. Why
| they kept the misleading classname is beyond me.
| alexfromapex wrote:
| Quantized random sampling regression
___________________________________________________________________
(page generated 2024-10-19 23:00 UTC)