hngopher.com

       [HN Gopher] Computer Based Spellchecking Techniques
       ___________________________________________________________________
        
       Computer Based Spellchecking Techniques
        
       Author : pcr910303
       Score  : 14 points
       Date   : 2022-07-24 12:27 UTC (1 days ago)
        
 (HTM) web link (web.archive.org)
 (TXT) w3m dump (web.archive.org)
        
       | homodeus wrote:
       | In 2022, "state of the art" is throwing a deep net at it. It will
       | likely pick up on all of these findings (and better ones,
       | incomprehensible to us) by itself given correct architecture and
       | enough data, but I can't help but feel a bit saddened by this -
       | seeing the ingenuity and mastery of all these cited names be
       | obscured and superseded so easily, in a way.
       | 
       | I love advancement in the field and what machine learning will
       | enable us to do, but I don't know what to make of this. One
       | argument is that now we have engineers who design the machine
       | learning models, but it is still depressing to me, for some
       | reason. Never knew I would feel like this, am I the only one?
       | 
       | P.S.: I'm commenting purely on this topic, which is an ideal big
       | data case - of course, we still have a long way to go with
       | machine learning, one where human minds will have to especially
       | shine.
        
       | marcodiego wrote:
       | > The hashing function described above is too simple to do the
       | job properly - dcd, hdb and various other non-words would all
       | hash to 223 and be accepted - but it's possible to devise more
       | complicated hashing functions so that hardly any non-words will
       | be accepted. You may use more than one hashing function; you
       | could derive, say, six numbers from the same word and check them
       | all in the bit map (or in six separate bit maps), accepting the
       | word only if all six bits were set.
       | 
       | Just described how a Bloom filter works.
        
       | jwstarr wrote:
       | A more quantitative approach can be found in a pair of papers
       | from John C Nesbit, who analyzed ten algorithms in 1985/86
       | (https://archive.org/details/sim_journal-of-computer-based-in...
       | ; https://archive.org/details/sim_journal-of-computer-based-
       | in...). Generalized edit distance performed best, but also took
       | the most time. The PLATO algorithm, which used a feature vector-
       | esque approach, came in third in quality and was also efficient.
       | Phonetic approaches came in third. Since the charts are hard to
       | read and summarize, I converted the result into F1 scores
       | (https://ztoz.blog/posts/nesbit/).
        
       ___________________________________________________________________
       (page generated 2022-07-25 23:01 UTC)