[HN Gopher] Prolog and Natural-Language Analysis (1987) [pdf]
___________________________________________________________________
Prolog and Natural-Language Analysis (1987) [pdf]
Author : Tomte
Score : 65 points
Date : 2024-05-13 11:41 UTC (2 days ago)
(HTM) web link (www.mtome.com)
(TXT) w3m dump (www.mtome.com)
| mcswell wrote:
| Around 1985, while I was working at the Artificial Intelligence
| Center of (the now defunct) Boeing Computer Services, I evaluated
| Fernando Pereira's NLP code written in Prolog for his
| dissertation (he was one of the authors of the referenced 1987
| article). My recollection is that his parser was very slow, and
| difficult to extend (adding rules to account for other English
| grammatical structures). Another fellow working at the AIC at the
| time had written a parser in LISP, and I ended up writing the
| English grammar for his parser.
|
| That's not to say that LISP was faster than Prolog in general,
| just this particular program was slow.
|
| Now a days, of course nobody writes parsers or grammars by hand
| like that. Which makes me sad, because it was a lot of fun :).
| mcswell wrote:
| I should have added, Pereira was (and is) a lot smarter than I
| am. He went on to do great things in computational linguistics,
| whereas I went on to do...smaller things.
| verdverm wrote:
| If you like this kind of stuff, CUE(lang) is highly influenced by
| Prolog and pre 90's NLP. The creator Marcel worked on a Typed
| Feature Structure for optimally representing grammar rules to
| support the way NLP was approached at the time.
|
| The CUE evaluator is a really interesting codebase for anyone
| interested in algos
| JimmyRuska wrote:
| Pretty amusing the old AI revolution was pure
| logic/reasoning/inference based. People knew to be a believable
| AI the system needed some level of believable reasoning and logic
| capabilities, but nobody wanted to decompose a business problem
| into disjunctive logic statements, and any additional logic can
| have implications across the whole universe of other logic making
| it hard to predict and maintain.
|
| LLMs brought this new revolution where it's not immediately
| obvious you're chatting with a machine, but, just like most
| humans, they still severely lack the ability to decompose
| unstructured data into logic statements and prove anything out.
| It would be amazing if they could write some datalog or prolog to
| approximate more complex neural-network-based understanding of
| some problem, as logic based systems are more explainable
| zcw100 wrote:
| That's what Stardog is doing https://www.stardog.com/
| LunaSea wrote:
| One of the reasons for why word vectors, sentence embeddings
| and LLMs won (for now) is that text found on the web
| especially, does not necessarily follow strict grammar and
| lexical rules.
|
| Sentences that are incorrect but still understandable.
|
| If you then include leet speak, acronyms, short form writing
| (SMS / Tweets), it quickly becomes unmanageable.
| puzzledobserver wrote:
| I am not a linguist, but I don't think that many linguists
| would agree with your assessment that dialects, leet speak,
| short form writing, slang, creoles, or vernaculars are
| necessarily ungrammatical.
|
| From what I understand, the modern understanding is that
| these point to the failure of grammar as a prescriptive
| exercise ("This is how thou shalt speak"). Human speech is
| too complex for simple grammar rules to fully capture its
| variety. Strict grammar and lexical rules were always
| fantasies of the grammar teacher anyway.
|
| See, for example, the following article on double negatives
| and African American Vernacular English:
| https://daily.jstor.org/black-english-matters/.
| agumonkey wrote:
| I wonder if people approach NLP as a sea of semes rather than
| a semi-rigid grammatical structures to then be affected with
| meaning. (probably but I'm not monitoring these field)
| srush wrote:
| This book is great. Really mind warping at first read. Fernando
| Pereira has had an incredible influence across NLP for his whole
| career. Here is an offhand list of papers to check out.
|
| * Conditional random fields: Probabilistic models for segmenting
| and labeling sequence data (2001) - Central paper of structured
| supervised learning in the 2000s era
|
| * Weighted finite-state transducers in speech recognition (2002)
| - This work and OpenFST are so clean
|
| * Non-projective dependency parsing using spanning tree
| algorithms (2005) - Influential work connecting graph algorithms
| to syntax. Less relevant now, but still such a nice paper.
|
| * Distributional clustering of English words (1994) - Proto word
| embeddings.
|
| * The Unreasonable Effectiveness of Data (2009) - More high-
| level, but certainly explains the last 15 years
| rhelz wrote:
| I bought a copy of this book used. There was a stamp in the
| frontispiece, saying that it was from the library of Bell Labs.
|
| I nearly cried---this is how a great institution crumbles---this
| is how great libraries are destroyed.
|
| Future generations are going to really be scratching their heads,
| wondering why we disbanded the institution which brought us the
| transistor and Unix, and instead funded billions of dollars into
| research in how to get us to click on buttons and doom scroll.
| linguae wrote:
| During Bell Labs' heyday, it was the beneficiary of AT&T's
| nationwide monopoly. AT&T was a monopoly subject to various
| restrictions by the US government as part of a series of legal
| cases. When a lab is part of a monopoly, whether it's due to
| being a natural monopoly (like AT&T) or due to patent rights
| (Xerox in its heyday), it could lavishly fund labs that give
| its researchers large amounts of freedom. Such resources and
| freedom resulted in many groundbreaking discoveries and
| inventions.
|
| However, AT&T was broken up in 1984 as the result of yet
| another lawsuit involving AT&T's monopoly. Bell Labs still
| remained, but it no longer had the same amount of resources.
| Thus, the lab's unfettered research culture gradually gave way
| to shorter-term research that showed promise of more immediate
| business impact. A similar thing happened to Xerox PARC when
| the federal government forced Xerox to license its xerography
| patents in the mid-1970s; this, combined with the end of a
| five-year agreement where Xerox's executives promised not to
| meddle in the operations of Xerox PARC, led to increased
| pressure on the researchers (though, ironically, Xerox
| infamously didn't take full advantage of the research PARC
| produced, though that's another story).
|
| Combine this with a business culture that emerged in the 1990s
| that disdains long-term, unfettered research and emphasizes
| short-term research with promises of immediate business impact,
| and this has resulted in the transformation of industrial
| research. There are some labs like Microsoft Research that
| still provide their researchers a great deal of freedom, but
| such labs are rare these days. It's amazing that well-resourced
| companies like Apple don't have labs like Bell Labs and Xerox
| PARC, but if businesses are beholden to quarterly results, why
| would they invest in long-term, risky research projects?
|
| This leaves government and academia. Unfortunately government,
| too, is often subject to ROI demands from politicians (which is
| nothing new; check out how the Mansfield Amendment changed ARPA
| into DARPA), and academia is subject to "publish or perish"
| demands.
|
| The running theme is that unfettered research with a proper
| amount of resources can result in world-changing discoveries
| and inventions. However, funding such research requires a large
| amount of resources as well as patience, since research takes
| time and results don't always come in neat quarterly or even
| annual periods. Our business culture lacks this type of
| patience, and many businesses lack the resources to devote to
| maintaining labs at the level of Bell Labs or Xerox PARC. Even
| academia and government lacks this type of patience.
|
| The question is how can we encourage unfettered research in a
| world that is unwilling to fund it. I've been thinking of ideas
| for quite some time, but I haven't fully fleshed them out yet.
___________________________________________________________________
(page generated 2024-05-15 23:01 UTC)