[HN Gopher] Typed Japanese
       ___________________________________________________________________
        
       Typed Japanese
        
       Author : Philpax
       Score  : 67 points
       Date   : 2025-03-29 13:36 UTC (9 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | IshKebab wrote:
       | Amazing. Is Japanese really that strict in its grammar?
        
         | kazinator wrote:
         | No.
        
         | retrac wrote:
         | I'm not sure what you mean by strict. If by "regular and
         | consistent" then, yes. There are a handful of irregular verbs.
         | Everything else is completely regular with suffixes attached to
         | the verb root.
         | 
         | Japanese is not unusual (on this particular point). That is
         | probably the most common structure for a language. Swahili
         | verbs are entirely regular and can have a long sequence of
         | prefixes which follow a strict order. kula - to eat.
         | usingalikula - (you) would not have eaten. halutayakula - (we)
         | will not eat (those noun class 6 objects)
        
           | oceanhaiyang wrote:
           | I think what they mean is can you form sentences that are not
           | textbook and still make sense. In that case as long as you
           | understand the particles and grammar you can make highly
           | backwards sentences still work.
        
             | GolDDranks wrote:
             | Every human language works this way...? This is why we have
             | systems called "grammars" in the first place: they capture
             | regularity in the structure of a language.
             | 
             | Human languages are, in a sense, "infinitary", so that they
             | aren't simply a set of fixed phrases. However there are,
             | depenging on the language, also patterns they don't allow.
             | (And some patterns _any_ human languages don't allow.)
             | 
             | This logically necessities language to have regularities.
             | We capture these regularities as "generative grammars".
        
           | thaumasiotes wrote:
           | > That is probably the most common structure for a language.
           | Swahili verbs are entirely regular and can have a long
           | sequence of prefixes which follow a strict order.
           | 
           | This seems a little confused. If a language's verbs are
           | _entirely_ regular, the normal point of view would be that
           | that language doesn 't inflect verbs at all.
           | 
           | For example, an English verb can include up to four prefixes,
           | which occur in an order that never changes and which do not
           | depend on the verb. In this sense, that part of the grammar
           | is "fully regular". (Note that all of these prefixes are
           | verbs, and with the exception of the first set, they inflect
           | like verbs. The first set inflect only for past tense.)
           | 
           | [Those prefixes, in order, indicate: modality (there are many
           | verbs that can occupy this space, but they're all mutually
           | exclusive with each other), perfect aspect (exactly one verb
           | can occur here), continuous aspect (again, exactly one verb
           | can occur here), passive voice (two verbs, mutually
           | exclusive). Any of the sixteen theoretical combinations of
           | these is allowed.]
           | 
           | But that's all periphrasis. English verbal _inflection_ is a
           | different phenomenon: each verb has five forms (except _be_ ,
           | which is special and has more): plain ( _take_ ), past (
           | _took_ ), third-person present singular ( _takes_ ), active
           | participle ( _taking_ ), and passive participle ( _taken_ ).
           | This is where a verb might be irregular or not. A regular
           | verb's past form adds the suffix /d/ to the plain form, the
           | active participle adds /ING/, the third-person present
           | singular form adds /z/, and the passive participle is
           | identical with the past form.
           | 
           | All verbs are regular in their active participle and only
           | _be_ is irregular in the third-person present singular.+ But
           | verbs might have arbitrary past forms ( _go_ / _went_ ) and
           | while the passive participles are never arbitrary, there is a
           | set of irregular forms following a common pattern that
           | descends from an earlier stage of the language. ( _taken_ /
           | _been_ / _gone_ / _known_ / _gotten_ / ...).
           | 
           | There is a double standard over what counts as irregularity
           | in verbs. By the standard we use for Latin, English has a
           | total of one irregular verb. A Latin verb may have a
           | perfective stem that is arbitrarily different from its
           | imperfective stem (compare _ferre_ with _tulisse_ ), but this
           | "doesn't count" - we say that it's necessary to memorize four
           | forms of any Latin verb, and with English verbs having only
           | five forms and two of them being always regular, it's a real
           | stretch to find verbs that require you to memorize more than
           | four forms.
           | 
           | + edit: this was wrong; _have_ is also irregular in the
           | present third-person singular.
        
         | niederman wrote:
         | No, this supports only a highly regular subset of Japanese
         | grammar. There are plenty of irregular phrases it doesn't
         | cover, as in most languages.
        
         | zerof1l wrote:
         | Textbook one that is usually taught in the first couple of
         | years - yes quite strict and can be deduced to logical
         | patterns. But every-day spoken informal Japanese... no.
         | 
         | And then there are a bunch of nuances that don't follow logic.
         | You just need to learn about them. For example, word "you" is
         | always considered informal, a bit intimate and a bit rude in
         | spoken Japanese, it can be used in formal written Japanese.
        
           | charcircuit wrote:
           | >But every-day spoken informal Japanese... no.
           | 
           | Even informal Japanese comes down to basic patterns for
           | grammar.
           | 
           | >For example, word ...
           | 
           | Word choice is not grammar. Knowing how to use words
           | correctly is the hard part.
        
           | cynicalkane wrote:
           | Well, one nice property of Japanese is the formal language is
           | perfectly workable as a language. You'll sound a little
           | stilted, but that's rather normal for foreigners.
           | 
           | If you spoke English while only sticking to formal grammar
           | rules, you'd be unable to communicate like a normal person at
           | all.
           | 
           | I think the comment on the word 'you' sounding informal is a
           | misunderstanding. People use a common formal noun for 'you',
           | anata, frequently; it's just the nature of Japanese
           | emphasizes social relationships and makes it easy to drop
           | references to people, so using such formal references can be
           | stilted in the wrong context. But that's a matter of word
           | choice, not grammar.
        
         | sparkie wrote:
         | Not strict, but there is underlying structure.
         | 
         | It's not really practical, but it is interesting to compare it
         | structurally to programming languages.
         | 
         | A verb (or verbal adjective) basically forms the "root" of a
         | clause, and typically appears at the end of it. If we related
         | that to the concept of a function, it would be like writing
         | them in postfix form, with whatever comes before it acting as
         | the arguments to it.                   (args)func
         | (WORDS)VERB
         | 
         | Other parts of the clause are suffixed with a spoken
         | "particle", which denotes its purpose in the sentence. These
         | kind of resemble optional/named parameters/keyword arguments,
         | where we can specify them out of canonical order - but the
         | particle behaves like the named parameter to specify which it
         | is, and they too appear in postfix position.
         | (arg1:subject, arg2:object)VERB              ;eg
         | (NOUN:ga, NOUN:wo)VERB         (NOUN:wo, NOUN:ga)VERB
         | 
         | Both examples mean the same, but the former would be more
         | typical, and the latter less common.
         | 
         | A verb clause which appears before a noun modifies the noun,
         | which bears similarity to a subexpression.
         | (NOUN:wa, ((NOUN:ni)VERB NOUN):wo)VERB
         | 
         | Some particles modify a sentence and come after the verb - for
         | example, to make it a question it's followed by a particle like
         | `ka`, `no`, `ne` or `na`. These might resembled keywords -
         | again appearing in a posfix position.
         | (NOUN:ga)VERB ka
         | 
         | Compared with a typical programming language, Japanese is
         | structured the opposite way - from right to left.
        
           | pjc50 wrote:
           | It's more like a stack language like FORTH.
        
           | thaumasiotes wrote:
           | > A verb (or verbal adjective) basically forms the "root" of
           | a clause, and typically appears at the end of it.
           | 
           | > Compared with a typical programming language, Japanese is
           | structured the opposite way - from right to left.
           | 
           | How consistent is that across different types of phrases?
           | 
           | English is generally described as being strongly right-
           | headed. But that's really a statement about English noun
           | phrases. Prepositional phrases are left-headed. Determiner
           | phrases, if that's your thing, are left-headed. Verbs are
           | normally positioned between their subject and object, with
           | other modifiers generally after the object.
           | 
           | I had a Chinese tutor who complained that everything in
           | Chinese was the other way around compared to English, but as
           | far as I can tell the odds are about 50-50 that any given
           | structure will match or reverse. Mandarin sentences are SVO
           | just like English. Mandarin prepositions go after their nouns
           | instead of before. Mandarin adjectives go before their nouns.
           | Mandarin noun compounds work the same way as English noun
           | compounds. (Ya Shua  "tooth brush" is a brush, not a tooth.)
           | Mandarin verbs have indirect objects before the verb instead
           | of after. Mandarin discourse particles occur at the end of
           | the sentence instead of the beginning.+ Mandarin relative
           | clauses work just like English noun phrases, which is funny
           | because English relative clauses have their own bespoke
           | ordering. (English: _the man who has five sisters_ ;
           | Mandarin: _has five sisters who [the] man_.)
           | 
           | The big lesson I drew from this is "it doesn't make a lot of
           | sense to describe languages overall as having a particular
           | orientation". Is Japanese more consistent, or is orientation
           | more sensitive to the specific structures you're using?
           | 
           | + Note that "beginning" and "end" aren't the only options. In
           | classical Latin and Greek, particles like this always appear
           | as the second word in a sentence.
        
             | sparkie wrote:
             | Japanese is consistent about the verb (or adjective,
             | adverb) coming last in the clause/sentence (besides certain
             | sentence-ending particles). There are no "particles"
             | necessary on verbs because they're always in the right
             | position.
             | 
             | The subject/object/means/destination/etc are flexible and
             | can appear in any order - but typically subject appears
             | before object - `NOUN ga NOUN wo VERB`, `NOUN ga NOUN ni
             | VERB` (where `ga` denotes the subject), whereas `NOUN wo
             | NOUN ga VERB` would be atypical, but not incorrect.
             | 
             | When a verb (or adjective/adverb) appears before a noun, as
             | in `VERB NOUN`, it modifies the noun - acting like an
             | adjective - `VERB NOUN` is like "NOUN that/which VERBS".
             | All adjectives work this way - ADJ NOUN, because verbs and
             | adjectives are not really disjoint word classes in
             | Japanese. Even the noun adjectives ("na-adjectives")
             | function this way - they appear as `NOUN na NOUN`, but `na`
             | is secretly a verb disguised as a particle. Examples:
             | sora ga aoi             (The sky is blue)         aoi sora
             | (Blue sky)    [i-adjective]         sora ga kirei desu
             | (The sky is pretty)         kirei na sora           (Pretty
             | sky)  [na-adjective]         sora ga haiiro ni naru  (The
             | sky turns grey)         haiiro ni naru sora     (Sky which
             | turns gray) [verb clause as adjective]
             | 
             | Spoken Japanese is very context sensitive, so things can be
             | omitted if they're already known to the listener - this can
             | include the verb, and any of the particles.
             | 
             | Verbs can appear out of order - for example at the start of
             | a sentence - but the meaning is understood based on tone or
             | pause in the speech - basically, if what were being spoken
             | were to be written it would be with a comma after the verb.
             | `VERB, NOUN`
             | 
             | Example:                   watashi wa hara ga hetta (I'm
             | hungry).         hara ga hetta, watashi   (Hungry, I am).
             | hara ga heru watashi     (Me who is hungry) [no pause or
             | comma]
        
             | canjobear wrote:
             | English is mixed head-initial and head-final, for example
             | objects go after verbs (head-initial) but adjectives go
             | before nouns (head-final). Japanese is strictly head-final.
        
       | lewisjoe wrote:
       | Can this be used to build a grammar checker for japanese
       | language?
        
         | charcircuit wrote:
         | No, it's missing basic things, accepts ungramattical sentences,
         | and is fundamentally flawed by being based off nihongokyouiku
         | grammar.
        
         | sparkie wrote:
         | No. Japanese is very context sensitive, and like any natural
         | language, has ambiguities. Japanese is loaded with _dajare_
         | (puns).
         | 
         | Grammar checking basically needs AI - you need to train some
         | model to understand common phrases and sentence structure.
         | Before LLMs there was software like MeCab[1] which done this,
         | and gave good results, but modern LLMs are much more capable.
         | 
         | [1]:https://taku910.github.io/mecab/
        
         | koito17 wrote:
         | Even if one could verify grammatical correctness, there are
         | many ways to produce unnatural Japanese phrases.
         | 
         | To give an easy example: 9tsu (9 things) is natural, but 10tsu
         | sounds extremely strange. However, 10Ge  sounds fine. When the
         | number is large enough, it's also common to not use Zhu Shu Ci
         | at all.
         | 
         | Sometimes, grammatical mistakes are _natural_ Japanese. For
         | instance, there is a concept of raBa kiYan Xie  (words with ra
         | dropped), where people will say e.g. Qin renai ( "I can't
         | sleep") instead of Qin rarenai. This is an error in
         | conjugation, yet it's natural language and applies to a few
         | other words, too.
         | 
         | Validating both grammar and word choice is still insufficient
         | to judge naturality of a Japanese phrase. A common "mistake"
         | made by many Japanese is writing [Wei He Gan woGan ziru] . The
         | verb is redundant because of the [Gan ]  in [Wei He Gan ] . The
         | "correct" word to use in this case is Jue eru. In practice,
         | however, either choice of word is understandable and considered
         | correct (except to those with the trivia of [Wei He Gan haJue
         | erumono!] )
         | 
         | Sometimes, redundancy makes phrases considered incorrect (see
         | Er Zhong Jing Yu  for an example). In other cases, nobody will
         | debate the correctness of the phrase.
        
         | GolDDranks wrote:
         | No, because what people mean when they say grammar checker, it
         | doesn't suffice to check whether a sentence is (formally
         | speaking) ungrammatical or not. You'd expect it also check word
         | choice, ortography etc. Those aren't part of the syntax
         | structure. This means that it would allow many very flawed
         | sentences.
         | 
         | Besides, the grammar this project uses is not likely to reflect
         | accurately the actual grammar of modern spoken or written
         | Japanese, and it's likely not to be even nearly complete; that
         | would mean it would also have a quite lot of false positive
         | "ungrammaticals".
         | 
         | Something _like_ this can certainly be used as a part of a
         | grammar checker. But in that case, you shouldn't implement it
         | in TypesScript's type system in the first place.
        
       | enricozb wrote:
       | Somewhat related: the Lambek calculus[0], which is kind of a
       | general type system for written languages. Words are either
       | functions or concrete types, and functions can take their
       | arguments on the left or right.
        
       | guerrilla wrote:
       | This reminds me of categorial grammars. Has anyone here ever
       | looked into them? I loved them. They're like simply typed lambda
       | calculi but instead of a single arrow type there are two, one for
       | the left and right side of a symbol. So you can so like:
       | and : Phrase \ Conjunction / Phrase         do : Subject \ Verb /
       | Object
       | 
       | Sorry if I have the directionality wrong but `and` would take a
       | phrase on the left, another on the right abd return a
       | conjunction. `do` would take an Subject and an Object and return
       | a verb. In STLC it would look like this:                   and :
       | Phrase -> Phrase -> Conjunction         do : Subject -> Object ->
       | Verb
       | 
       | These are bad examples just for illustrations, consult real
       | linguists.
       | 
       | Last time I looked at it they were adding polymorphism and a
       | couple of people were starting to think about dependent types. It
       | was mostly linguists interested in it but it was hardcore math
       | and CS. Can't remember the names involved. Carl H. I forget what
       | the H stands for. Hmm. Some fanous type theorists had written
       | about them too.
       | 
       | Anyway, seemed relevant.
        
         | canjobear wrote:
         | Lots of work on this kind of thing.
         | 
         | A good starting point is
         | https://en.wikipedia.org/wiki/Combinatory_categorial_grammar
        
       ___________________________________________________________________
       (page generated 2025-03-29 23:00 UTC)