[HN Gopher] Trinary Decision Trees for missing value handling
___________________________________________________________________
Trinary Decision Trees for missing value handling
Author : PaulHoule
Score : 45 points
Date : 2023-09-12 12:49 UTC (10 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| aatd86 wrote:
| Either encode with nil or if space is not a problem, use an
| aditional field that stores a bitmap. Which allows to use
| structures with pointers without making nil semantics fuzzy.
|
| Can even represent that bitmap as a vector and create a presence
| operator that is essentially a kind of intersection operation.
| TensorTinkerer wrote:
| Interesting concept to handle missing data using Trinary decision
| trees. At a high-level, it seems reminiscent of Multiple
| Imputation in randomForests which could address missingness.
| Though the Trinary tree takes a different approach by not
| presuming the missing values harbor any significant information
| about the response. It's intriguing that it shines in MCAR
| settings, but falls short with Informative Missingness.
|
| > "Notably, the Trinary tree outperforms its peers in MCAR
| settings, especially when data is only missing out-of-sample,
| while lacking behind in IM settings."
|
| This somewhat mirrors the behavior of early imputation
| strategies. One must ponder, however, how the Trinary tree would
| perform vis-a-vis older methods like CART's surrogate splits or
| C4.5's probabilistic splits for handling missing values. These
| older methods were crafted with an intuition somewhat similar to
| the Trinary tree.
|
| It's also great to see the amalgamation of Trinary tree with the
| Missing In Attributes approach into the TrinaryMIA tree. But the
| efficacy of this hybrid model isn't completely surprising. MIA
| has historically shown resilience in diverse missing data
| scenarios, and combining that with the Trinary's approach could
| harmonize their strengths.
|
| What would be really enticing is to see if the essence of the
| Trinary decision tree can be injected into boosting models like
| XGBoost or LightGBM. Since these models are notorious for their
| treatment of missing values, maybe there's some potential
| symbiosis there?
| micro_cam wrote:
| I implemented something like this in a [pre xgboost boosting
| framework](https://github.com/ryanbressler/CloudForest) ~10
| years ago and it worked well.
|
| It isn't even that much of a speed hit using the classical
| sorting CART implementation. However xgboost and ligthgbm use
| histogram based approximate sorting which might be harder to
| adapt in a performant way. And certainly the code will be a lot
| messier.
| Macuyiko wrote:
| Came here to cite your work, I even mention "CloudForest" in
| my slides still as "an interesting implementation that is
| also capable of handling NANs in DTs in a slightly different
| way." Crazy this has already been 10 years.
| bitwize wrote:
| True, False, and File Not Found:
| https://thedailywtf.com/articles/What_Is_Truth_0x3f_
|
| Guess that developer was right all along...
| anthony_romeo wrote:
| Minor and pedantic: should this instead be "Ternary Decision
| Tree"?
| drBonkers wrote:
| Yes, but trinary honestly makes more sense in contemporary
| English. I'm not sure whether preserving Greco-Latin roots or
| selecting for usability is more important here.
| thomasmg wrote:
| Well, it seems "ternary" is more common than "trinary",
| according to https://www.reddit.com/r/Iota/comments/cepkv7/di
| fference_bet...
|
| Update: there's also the "Ternary tree", the "Ternary search
| tree", the "Ternary heap", and the "Ternary numeral system".
| Most have even Wikipedia articles. There is no Wikipedia
| article for anything "Trinary" related to computer science.
| ynniv wrote:
| Seems related to "Attention Is Off By One "
| https://news.ycombinator.com/item?id=36851494
| JustFinishedBSG wrote:
| I am 99% certain this is not novel and I've seen it in old ( in
| the CART times ) papers....
| PaulHoule wrote:
| In the "symbolic AI" area it is so common for results to just
| be forgotten for 20-30 years and then come back.
|
| For instance Datalog was developed in 1986 but when I got
| interested it in 2006 I found it was really obscure, by 2016
| there was some literature and a few good implementations and it
| is rolling on.
|
| There were those old "rules engines": many of the classic
| expert systems were developed _before_ the famous RETE
| algorithm and even then there were RETE implementations that
| were terribly slow because they didn 't use hashtables! When
| that stuff was fashionable people thought 10,000 rules were a
| lot, today it is more like 10,000,000 rules are a lot and I'd
| credit that more to algorithms being better than the computers
| being more powerful.
|
| Similarly since the mid 2000's there has been a revolution in
| SAT and SMT solving which goes back to the very early (1960s)
| work on trying to solve problems w/ logic it is just now we
| succeed instead of fail. (It's a lot like the neural network
| revolution and used the same method of competition as seen in
| TREC and visual recognition.)
| riku_iki wrote:
| there is probably difference if the paper claims that it
| resurrected some old concept and implemented it with better
| results compared to "this paper introduces X" as in case of
| this post.
| micro_cam wrote:
| I've got a ~10 year old implementation that does something
| similar calling it "three way splitting" here:
| https://github.com/ryanbressler/CloudForest
|
| And i got the idea from a lab mate, Timo Erkkila's RF-ACE
| project though neither of us thought it was a particularly
| novel idea.
| mathisfun123 wrote:
| Pardon me for being a currently pissed off PhD student dealing
| with a rejected paper but: who gives a fuck? Like honestly what
| is the harm of someone reinventing something, especially when
| the previous instance is long dead and buried/forgotten (like
| in this case). Oh it's a waste of resources you say? As if the
| constant demand for novelty isn't also.
|
| My paper got rejected because a reviewer dug up some vaguely
| similar thing from 3 years ago and claimed it is "well used and
| well supported" _even though the repo has had 3 commits in 3
| years_. Oh that 's an extreme case you say? The presumptive
| necessity of "novelty" is what enables such extremes.
|
| Like the schizophrenia of modern society is hilarious: 15
| brands of potato chips is good (competition in the market is
| healthy) but one molecule in common (in science) and you're a
| failure.
| riku_iki wrote:
| > reviewer dug up
|
| is there any way to appeal such decision? it is not healthy
| when single person can block lots of hard work..
| mathisfun123 wrote:
| nope - welcome to peer review where everything's made up
| and the points don't matter.
|
| note: some venues have a rebuttal phase which 99% doesn't
| make a difference anyway but this one didn't.
|
| note2: if you're thinking i got rejected for some _other_
| reason: i got (weak accept, accept, reject) and the lone
| reject came from the wackadoodle. btw how does (weak
| accept+accept+reject) == reject? see my earlier comment
| about how the scores don 't matter - i emailed the PC and
| they said they use the scores to resolve a failure to come
| to a consensus by the reviewers (which is 180deg the
| opposite of how a scoring system is used - scoring systems
| "objectively" measure and then humans break ties).
|
| note3: this isn't a podunk conference - it's a big name,
| prestigious conference.
| [deleted]
| robotresearcher wrote:
| Let's assume that the reviewer is correct and the key
| idea of the paper is not new.
|
| (edit: parent subsequently reports they mention the
| precedent work in their paper. This is a different
| scenario. I leave the rest of my comment which was about
| the scenario as originally described)
|
| What's the alternative? You want to present an idea that
| isn't novel?
|
| > how does (weak accept+accept+reject) == reject
|
| The editor or program chair that makes the decision knows
| from Reviewer 3 that the paper is not novel. If the
| paper's value is based on novelty, it has to be rejected.
|
| If the paper has some value other than novelty, edit it
| to include the key citation and very carefully explain
| the contribution of the paper, and submit somewhere else.
| If there is no such contribution, tough. Do a better
| literature survey next time.
|
| > it's a big name, prestigious conference
|
| Imagine how you would feel if you were the author of the
| obscure paper and someone who didn't bother to read it
| got your idea published in a prestigious conference.
|
| > The presumptive necessity of "novelty" is what enables
| such extremes.
|
| Did your paper claim novelty? That claim obviously would
| need to be removed. Without novelty, what is the
| contribution? Your paper needs to explain.
|
| If the world needs to know about the previous paper, so
| that the idea is recognized, write a review or case study
| that highlights it.
| mathisfun123 wrote:
| people will bend over backwards to accomodate/justify the
| stupid/biased/rotten practices in peer review.
|
| > Let's assume that the reviewer is correct and the key
| idea of the paper is not new.
|
| umm why are we assuming this? i mentioned the other
| project in my paper and pointed out the distinctions.
| even so:
|
| > You want to present an idea that isn't novel?
|
| yes. explain to me in very specific detail what the issue
| is with that in 2023 where page limits are an
| anachronism? don't have room at the conference venue
| itself - i'm fine recording the presentation and putting
| it up on youtube.
|
| > The editor or program chair that makes the decision
| knows from Reviewer 3 that the paper is not novel.
|
| reviewer 3 is demonstrably wrong/lying - my project is
| distinct _and_ gets more use /support (it has been merged
| into the upstream core ecoystem project).
|
| > If the paper has some value other than novelty, edit it
| to include the key citation and very carefully explain
| the contribution of the paper
|
| done and done but there's a crucial component you're
| eliding over - reviewer has to actually read the paper.
|
| > Do a better literature survey next time.
|
| lol
|
| > Imagine how you would feel if you were the author of
| the obscure paper and someone who didn't bother to read
| it got your idea published in a prestigious conference.
|
| there's no paper. it's a github repo.
|
| you see the absurdity grows and grows until one can't
| even believe that it's possible such a horrible
| miscarriage of academic justice/honesty was committed.
| but i have all the receipts as the kids say and the real
| miscarriage is it's worthless to complain because the
| entire practice is complete and utter bs.
| [deleted]
| robotresearcher wrote:
| > i mentioned the other project in my paper and pointed
| out the distinctions.
|
| You said the reviewer dug it up. You didn't say you cited
| it. That's a very different situation!
|
| I've edited my comment to reflect the new information.
| mathisfun123 wrote:
| i also didn't tell you what my favorite color is nor my
| wife's maiden name - maybe you shouldn't be such a
| presumptuous condescending <word>.
|
| again if you don't believe me, it doesn't matter. the
| paper exists and the reviews exist and the email exchange
| with the PC exists and i have shown all of these things
| to my own committee (and if i knew who you were and
| trusted you, i would show them to you as well).
|
| > I've edited my comment to reflect the new information.
|
| your edits further cements your position as presumptuous
| and obliging of a process that warrants/merits no such
| thing.
| robotresearcher wrote:
| I've believed everything you said, when you said it.
|
| The difference between
|
| "My paper got rejected because a reviewer dug up some
| vaguely similar thing from 3 years ago"
|
| and
|
| "I mentioned the other project in my paper and pointed
| out the distinctions."
|
| seems to me to be quite important. After hearing the
| first by itself, the second came as a surprise.
| mathisfun123 wrote:
| They dug it up as a _cause_ for rejection, like one digs
| up skeletons in someone 's closet to discredit them.
|
| I'm pretty exhausted from dealing with this _and_ now
| explaining it to you but I have an update: I 've just now
| gotten an email from the PC (after emailing them
| yesterday pointing out the flaws in reviewer 3's
| reasoning) saying I'm welcome to come present (and
| highlight the differences) in a "lightning talk" that
| won't be published in the proceedings. Looks like they're
| going with the "do it for the exposure" influencer model.
| I've declined the "opportunity".
| robotresearcher wrote:
| I marked the edit as such. Not a stealth edit.
___________________________________________________________________
(page generated 2023-09-12 23:02 UTC)