[HN Gopher] Trinary Decision Trees for missing value handling
       ___________________________________________________________________
        
       Trinary Decision Trees for missing value handling
        
       Author : PaulHoule
       Score  : 45 points
       Date   : 2023-09-12 12:49 UTC (10 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | aatd86 wrote:
       | Either encode with nil or if space is not a problem, use an
       | aditional field that stores a bitmap. Which allows to use
       | structures with pointers without making nil semantics fuzzy.
       | 
       | Can even represent that bitmap as a vector and create a presence
       | operator that is essentially a kind of intersection operation.
        
       | TensorTinkerer wrote:
       | Interesting concept to handle missing data using Trinary decision
       | trees. At a high-level, it seems reminiscent of Multiple
       | Imputation in randomForests which could address missingness.
       | Though the Trinary tree takes a different approach by not
       | presuming the missing values harbor any significant information
       | about the response. It's intriguing that it shines in MCAR
       | settings, but falls short with Informative Missingness.
       | 
       | > "Notably, the Trinary tree outperforms its peers in MCAR
       | settings, especially when data is only missing out-of-sample,
       | while lacking behind in IM settings."
       | 
       | This somewhat mirrors the behavior of early imputation
       | strategies. One must ponder, however, how the Trinary tree would
       | perform vis-a-vis older methods like CART's surrogate splits or
       | C4.5's probabilistic splits for handling missing values. These
       | older methods were crafted with an intuition somewhat similar to
       | the Trinary tree.
       | 
       | It's also great to see the amalgamation of Trinary tree with the
       | Missing In Attributes approach into the TrinaryMIA tree. But the
       | efficacy of this hybrid model isn't completely surprising. MIA
       | has historically shown resilience in diverse missing data
       | scenarios, and combining that with the Trinary's approach could
       | harmonize their strengths.
       | 
       | What would be really enticing is to see if the essence of the
       | Trinary decision tree can be injected into boosting models like
       | XGBoost or LightGBM. Since these models are notorious for their
       | treatment of missing values, maybe there's some potential
       | symbiosis there?
        
         | micro_cam wrote:
         | I implemented something like this in a [pre xgboost boosting
         | framework](https://github.com/ryanbressler/CloudForest) ~10
         | years ago and it worked well.
         | 
         | It isn't even that much of a speed hit using the classical
         | sorting CART implementation. However xgboost and ligthgbm use
         | histogram based approximate sorting which might be harder to
         | adapt in a performant way. And certainly the code will be a lot
         | messier.
        
           | Macuyiko wrote:
           | Came here to cite your work, I even mention "CloudForest" in
           | my slides still as "an interesting implementation that is
           | also capable of handling NANs in DTs in a slightly different
           | way." Crazy this has already been 10 years.
        
       | bitwize wrote:
       | True, False, and File Not Found:
       | https://thedailywtf.com/articles/What_Is_Truth_0x3f_
       | 
       | Guess that developer was right all along...
        
       | anthony_romeo wrote:
       | Minor and pedantic: should this instead be "Ternary Decision
       | Tree"?
        
         | drBonkers wrote:
         | Yes, but trinary honestly makes more sense in contemporary
         | English. I'm not sure whether preserving Greco-Latin roots or
         | selecting for usability is more important here.
        
           | thomasmg wrote:
           | Well, it seems "ternary" is more common than "trinary",
           | according to https://www.reddit.com/r/Iota/comments/cepkv7/di
           | fference_bet...
           | 
           | Update: there's also the "Ternary tree", the "Ternary search
           | tree", the "Ternary heap", and the "Ternary numeral system".
           | Most have even Wikipedia articles. There is no Wikipedia
           | article for anything "Trinary" related to computer science.
        
       | ynniv wrote:
       | Seems related to "Attention Is Off By One "
       | https://news.ycombinator.com/item?id=36851494
        
       | JustFinishedBSG wrote:
       | I am 99% certain this is not novel and I've seen it in old ( in
       | the CART times ) papers....
        
         | PaulHoule wrote:
         | In the "symbolic AI" area it is so common for results to just
         | be forgotten for 20-30 years and then come back.
         | 
         | For instance Datalog was developed in 1986 but when I got
         | interested it in 2006 I found it was really obscure, by 2016
         | there was some literature and a few good implementations and it
         | is rolling on.
         | 
         | There were those old "rules engines": many of the classic
         | expert systems were developed _before_ the famous RETE
         | algorithm and even then there were RETE implementations that
         | were terribly slow because they didn 't use hashtables! When
         | that stuff was fashionable people thought 10,000 rules were a
         | lot, today it is more like 10,000,000 rules are a lot and I'd
         | credit that more to algorithms being better than the computers
         | being more powerful.
         | 
         | Similarly since the mid 2000's there has been a revolution in
         | SAT and SMT solving which goes back to the very early (1960s)
         | work on trying to solve problems w/ logic it is just now we
         | succeed instead of fail. (It's a lot like the neural network
         | revolution and used the same method of competition as seen in
         | TREC and visual recognition.)
        
           | riku_iki wrote:
           | there is probably difference if the paper claims that it
           | resurrected some old concept and implemented it with better
           | results compared to "this paper introduces X" as in case of
           | this post.
        
         | micro_cam wrote:
         | I've got a ~10 year old implementation that does something
         | similar calling it "three way splitting" here:
         | https://github.com/ryanbressler/CloudForest
         | 
         | And i got the idea from a lab mate, Timo Erkkila's RF-ACE
         | project though neither of us thought it was a particularly
         | novel idea.
        
         | mathisfun123 wrote:
         | Pardon me for being a currently pissed off PhD student dealing
         | with a rejected paper but: who gives a fuck? Like honestly what
         | is the harm of someone reinventing something, especially when
         | the previous instance is long dead and buried/forgotten (like
         | in this case). Oh it's a waste of resources you say? As if the
         | constant demand for novelty isn't also.
         | 
         | My paper got rejected because a reviewer dug up some vaguely
         | similar thing from 3 years ago and claimed it is "well used and
         | well supported" _even though the repo has had 3 commits in 3
         | years_. Oh that 's an extreme case you say? The presumptive
         | necessity of "novelty" is what enables such extremes.
         | 
         | Like the schizophrenia of modern society is hilarious: 15
         | brands of potato chips is good (competition in the market is
         | healthy) but one molecule in common (in science) and you're a
         | failure.
        
           | riku_iki wrote:
           | > reviewer dug up
           | 
           | is there any way to appeal such decision? it is not healthy
           | when single person can block lots of hard work..
        
             | mathisfun123 wrote:
             | nope - welcome to peer review where everything's made up
             | and the points don't matter.
             | 
             | note: some venues have a rebuttal phase which 99% doesn't
             | make a difference anyway but this one didn't.
             | 
             | note2: if you're thinking i got rejected for some _other_
             | reason: i got (weak accept, accept, reject) and the lone
             | reject came from the wackadoodle. btw how does (weak
             | accept+accept+reject) == reject? see my earlier comment
             | about how the scores don 't matter - i emailed the PC and
             | they said they use the scores to resolve a failure to come
             | to a consensus by the reviewers (which is 180deg the
             | opposite of how a scoring system is used - scoring systems
             | "objectively" measure and then humans break ties).
             | 
             | note3: this isn't a podunk conference - it's a big name,
             | prestigious conference.
        
               | [deleted]
        
               | robotresearcher wrote:
               | Let's assume that the reviewer is correct and the key
               | idea of the paper is not new.
               | 
               | (edit: parent subsequently reports they mention the
               | precedent work in their paper. This is a different
               | scenario. I leave the rest of my comment which was about
               | the scenario as originally described)
               | 
               | What's the alternative? You want to present an idea that
               | isn't novel?
               | 
               | > how does (weak accept+accept+reject) == reject
               | 
               | The editor or program chair that makes the decision knows
               | from Reviewer 3 that the paper is not novel. If the
               | paper's value is based on novelty, it has to be rejected.
               | 
               | If the paper has some value other than novelty, edit it
               | to include the key citation and very carefully explain
               | the contribution of the paper, and submit somewhere else.
               | If there is no such contribution, tough. Do a better
               | literature survey next time.
               | 
               | > it's a big name, prestigious conference
               | 
               | Imagine how you would feel if you were the author of the
               | obscure paper and someone who didn't bother to read it
               | got your idea published in a prestigious conference.
               | 
               | > The presumptive necessity of "novelty" is what enables
               | such extremes.
               | 
               | Did your paper claim novelty? That claim obviously would
               | need to be removed. Without novelty, what is the
               | contribution? Your paper needs to explain.
               | 
               | If the world needs to know about the previous paper, so
               | that the idea is recognized, write a review or case study
               | that highlights it.
        
               | mathisfun123 wrote:
               | people will bend over backwards to accomodate/justify the
               | stupid/biased/rotten practices in peer review.
               | 
               | > Let's assume that the reviewer is correct and the key
               | idea of the paper is not new.
               | 
               | umm why are we assuming this? i mentioned the other
               | project in my paper and pointed out the distinctions.
               | even so:
               | 
               | > You want to present an idea that isn't novel?
               | 
               | yes. explain to me in very specific detail what the issue
               | is with that in 2023 where page limits are an
               | anachronism? don't have room at the conference venue
               | itself - i'm fine recording the presentation and putting
               | it up on youtube.
               | 
               | > The editor or program chair that makes the decision
               | knows from Reviewer 3 that the paper is not novel.
               | 
               | reviewer 3 is demonstrably wrong/lying - my project is
               | distinct _and_ gets more use /support (it has been merged
               | into the upstream core ecoystem project).
               | 
               | > If the paper has some value other than novelty, edit it
               | to include the key citation and very carefully explain
               | the contribution of the paper
               | 
               | done and done but there's a crucial component you're
               | eliding over - reviewer has to actually read the paper.
               | 
               | > Do a better literature survey next time.
               | 
               | lol
               | 
               | > Imagine how you would feel if you were the author of
               | the obscure paper and someone who didn't bother to read
               | it got your idea published in a prestigious conference.
               | 
               | there's no paper. it's a github repo.
               | 
               | you see the absurdity grows and grows until one can't
               | even believe that it's possible such a horrible
               | miscarriage of academic justice/honesty was committed.
               | but i have all the receipts as the kids say and the real
               | miscarriage is it's worthless to complain because the
               | entire practice is complete and utter bs.
        
               | [deleted]
        
               | robotresearcher wrote:
               | > i mentioned the other project in my paper and pointed
               | out the distinctions.
               | 
               | You said the reviewer dug it up. You didn't say you cited
               | it. That's a very different situation!
               | 
               | I've edited my comment to reflect the new information.
        
               | mathisfun123 wrote:
               | i also didn't tell you what my favorite color is nor my
               | wife's maiden name - maybe you shouldn't be such a
               | presumptuous condescending <word>.
               | 
               | again if you don't believe me, it doesn't matter. the
               | paper exists and the reviews exist and the email exchange
               | with the PC exists and i have shown all of these things
               | to my own committee (and if i knew who you were and
               | trusted you, i would show them to you as well).
               | 
               | > I've edited my comment to reflect the new information.
               | 
               | your edits further cements your position as presumptuous
               | and obliging of a process that warrants/merits no such
               | thing.
        
               | robotresearcher wrote:
               | I've believed everything you said, when you said it.
               | 
               | The difference between
               | 
               | "My paper got rejected because a reviewer dug up some
               | vaguely similar thing from 3 years ago"
               | 
               | and
               | 
               | "I mentioned the other project in my paper and pointed
               | out the distinctions."
               | 
               | seems to me to be quite important. After hearing the
               | first by itself, the second came as a surprise.
        
               | mathisfun123 wrote:
               | They dug it up as a _cause_ for rejection, like one digs
               | up skeletons in someone 's closet to discredit them.
               | 
               | I'm pretty exhausted from dealing with this _and_ now
               | explaining it to you but I have an update: I 've just now
               | gotten an email from the PC (after emailing them
               | yesterday pointing out the flaws in reviewer 3's
               | reasoning) saying I'm welcome to come present (and
               | highlight the differences) in a "lightning talk" that
               | won't be published in the proceedings. Looks like they're
               | going with the "do it for the exposure" influencer model.
               | I've declined the "opportunity".
        
               | robotresearcher wrote:
               | I marked the edit as such. Not a stealth edit.
        
       ___________________________________________________________________
       (page generated 2023-09-12 23:02 UTC)