hngopher.com

       [HN Gopher] Prolog and Natural-Language Analysis (1987) [pdf]
       ___________________________________________________________________
        
       Prolog and Natural-Language Analysis (1987) [pdf]
        
       Author : Tomte
       Score  : 65 points
       Date   : 2024-05-13 11:41 UTC (2 days ago)
        
 (HTM) web link (www.mtome.com)
 (TXT) w3m dump (www.mtome.com)
        
       | mcswell wrote:
       | Around 1985, while I was working at the Artificial Intelligence
       | Center of (the now defunct) Boeing Computer Services, I evaluated
       | Fernando Pereira's NLP code written in Prolog for his
       | dissertation (he was one of the authors of the referenced 1987
       | article). My recollection is that his parser was very slow, and
       | difficult to extend (adding rules to account for other English
       | grammatical structures). Another fellow working at the AIC at the
       | time had written a parser in LISP, and I ended up writing the
       | English grammar for his parser.
       | 
       | That's not to say that LISP was faster than Prolog in general,
       | just this particular program was slow.
       | 
       | Now a days, of course nobody writes parsers or grammars by hand
       | like that. Which makes me sad, because it was a lot of fun :).
        
         | mcswell wrote:
         | I should have added, Pereira was (and is) a lot smarter than I
         | am. He went on to do great things in computational linguistics,
         | whereas I went on to do...smaller things.
        
       | verdverm wrote:
       | If you like this kind of stuff, CUE(lang) is highly influenced by
       | Prolog and pre 90's NLP. The creator Marcel worked on a Typed
       | Feature Structure for optimally representing grammar rules to
       | support the way NLP was approached at the time.
       | 
       | The CUE evaluator is a really interesting codebase for anyone
       | interested in algos
        
       | JimmyRuska wrote:
       | Pretty amusing the old AI revolution was pure
       | logic/reasoning/inference based. People knew to be a believable
       | AI the system needed some level of believable reasoning and logic
       | capabilities, but nobody wanted to decompose a business problem
       | into disjunctive logic statements, and any additional logic can
       | have implications across the whole universe of other logic making
       | it hard to predict and maintain.
       | 
       | LLMs brought this new revolution where it's not immediately
       | obvious you're chatting with a machine, but, just like most
       | humans, they still severely lack the ability to decompose
       | unstructured data into logic statements and prove anything out.
       | It would be amazing if they could write some datalog or prolog to
       | approximate more complex neural-network-based understanding of
       | some problem, as logic based systems are more explainable
        
         | zcw100 wrote:
         | That's what Stardog is doing https://www.stardog.com/
        
         | LunaSea wrote:
         | One of the reasons for why word vectors, sentence embeddings
         | and LLMs won (for now) is that text found on the web
         | especially, does not necessarily follow strict grammar and
         | lexical rules.
         | 
         | Sentences that are incorrect but still understandable.
         | 
         | If you then include leet speak, acronyms, short form writing
         | (SMS / Tweets), it quickly becomes unmanageable.
        
           | puzzledobserver wrote:
           | I am not a linguist, but I don't think that many linguists
           | would agree with your assessment that dialects, leet speak,
           | short form writing, slang, creoles, or vernaculars are
           | necessarily ungrammatical.
           | 
           | From what I understand, the modern understanding is that
           | these point to the failure of grammar as a prescriptive
           | exercise ("This is how thou shalt speak"). Human speech is
           | too complex for simple grammar rules to fully capture its
           | variety. Strict grammar and lexical rules were always
           | fantasies of the grammar teacher anyway.
           | 
           | See, for example, the following article on double negatives
           | and African American Vernacular English:
           | https://daily.jstor.org/black-english-matters/.
        
           | agumonkey wrote:
           | I wonder if people approach NLP as a sea of semes rather than
           | a semi-rigid grammatical structures to then be affected with
           | meaning. (probably but I'm not monitoring these field)
        
       | srush wrote:
       | This book is great. Really mind warping at first read. Fernando
       | Pereira has had an incredible influence across NLP for his whole
       | career. Here is an offhand list of papers to check out.
       | 
       | * Conditional random fields: Probabilistic models for segmenting
       | and labeling sequence data (2001) - Central paper of structured
       | supervised learning in the 2000s era
       | 
       | * Weighted finite-state transducers in speech recognition (2002)
       | - This work and OpenFST are so clean
       | 
       | * Non-projective dependency parsing using spanning tree
       | algorithms (2005) - Influential work connecting graph algorithms
       | to syntax. Less relevant now, but still such a nice paper.
       | 
       | * Distributional clustering of English words (1994) - Proto word
       | embeddings.
       | 
       | * The Unreasonable Effectiveness of Data (2009) - More high-
       | level, but certainly explains the last 15 years
        
       | rhelz wrote:
       | I bought a copy of this book used. There was a stamp in the
       | frontispiece, saying that it was from the library of Bell Labs.
       | 
       | I nearly cried---this is how a great institution crumbles---this
       | is how great libraries are destroyed.
       | 
       | Future generations are going to really be scratching their heads,
       | wondering why we disbanded the institution which brought us the
       | transistor and Unix, and instead funded billions of dollars into
       | research in how to get us to click on buttons and doom scroll.
        
         | linguae wrote:
         | During Bell Labs' heyday, it was the beneficiary of AT&T's
         | nationwide monopoly. AT&T was a monopoly subject to various
         | restrictions by the US government as part of a series of legal
         | cases. When a lab is part of a monopoly, whether it's due to
         | being a natural monopoly (like AT&T) or due to patent rights
         | (Xerox in its heyday), it could lavishly fund labs that give
         | its researchers large amounts of freedom. Such resources and
         | freedom resulted in many groundbreaking discoveries and
         | inventions.
         | 
         | However, AT&T was broken up in 1984 as the result of yet
         | another lawsuit involving AT&T's monopoly. Bell Labs still
         | remained, but it no longer had the same amount of resources.
         | Thus, the lab's unfettered research culture gradually gave way
         | to shorter-term research that showed promise of more immediate
         | business impact. A similar thing happened to Xerox PARC when
         | the federal government forced Xerox to license its xerography
         | patents in the mid-1970s; this, combined with the end of a
         | five-year agreement where Xerox's executives promised not to
         | meddle in the operations of Xerox PARC, led to increased
         | pressure on the researchers (though, ironically, Xerox
         | infamously didn't take full advantage of the research PARC
         | produced, though that's another story).
         | 
         | Combine this with a business culture that emerged in the 1990s
         | that disdains long-term, unfettered research and emphasizes
         | short-term research with promises of immediate business impact,
         | and this has resulted in the transformation of industrial
         | research. There are some labs like Microsoft Research that
         | still provide their researchers a great deal of freedom, but
         | such labs are rare these days. It's amazing that well-resourced
         | companies like Apple don't have labs like Bell Labs and Xerox
         | PARC, but if businesses are beholden to quarterly results, why
         | would they invest in long-term, risky research projects?
         | 
         | This leaves government and academia. Unfortunately government,
         | too, is often subject to ROI demands from politicians (which is
         | nothing new; check out how the Mansfield Amendment changed ARPA
         | into DARPA), and academia is subject to "publish or perish"
         | demands.
         | 
         | The running theme is that unfettered research with a proper
         | amount of resources can result in world-changing discoveries
         | and inventions. However, funding such research requires a large
         | amount of resources as well as patience, since research takes
         | time and results don't always come in neat quarterly or even
         | annual periods. Our business culture lacks this type of
         | patience, and many businesses lack the resources to devote to
         | maintaining labs at the level of Bell Labs or Xerox PARC. Even
         | academia and government lacks this type of patience.
         | 
         | The question is how can we encourage unfettered research in a
         | world that is unwilling to fund it. I've been thinking of ideas
         | for quite some time, but I haven't fully fleshed them out yet.
        
       ___________________________________________________________________
       (page generated 2024-05-15 23:01 UTC)