[HN Gopher] Typed Japanese
___________________________________________________________________
Typed Japanese
Author : Philpax
Score : 67 points
Date : 2025-03-29 13:36 UTC (9 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| IshKebab wrote:
| Amazing. Is Japanese really that strict in its grammar?
| kazinator wrote:
| No.
| retrac wrote:
| I'm not sure what you mean by strict. If by "regular and
| consistent" then, yes. There are a handful of irregular verbs.
| Everything else is completely regular with suffixes attached to
| the verb root.
|
| Japanese is not unusual (on this particular point). That is
| probably the most common structure for a language. Swahili
| verbs are entirely regular and can have a long sequence of
| prefixes which follow a strict order. kula - to eat.
| usingalikula - (you) would not have eaten. halutayakula - (we)
| will not eat (those noun class 6 objects)
| oceanhaiyang wrote:
| I think what they mean is can you form sentences that are not
| textbook and still make sense. In that case as long as you
| understand the particles and grammar you can make highly
| backwards sentences still work.
| GolDDranks wrote:
| Every human language works this way...? This is why we have
| systems called "grammars" in the first place: they capture
| regularity in the structure of a language.
|
| Human languages are, in a sense, "infinitary", so that they
| aren't simply a set of fixed phrases. However there are,
| depenging on the language, also patterns they don't allow.
| (And some patterns _any_ human languages don't allow.)
|
| This logically necessities language to have regularities.
| We capture these regularities as "generative grammars".
| thaumasiotes wrote:
| > That is probably the most common structure for a language.
| Swahili verbs are entirely regular and can have a long
| sequence of prefixes which follow a strict order.
|
| This seems a little confused. If a language's verbs are
| _entirely_ regular, the normal point of view would be that
| that language doesn 't inflect verbs at all.
|
| For example, an English verb can include up to four prefixes,
| which occur in an order that never changes and which do not
| depend on the verb. In this sense, that part of the grammar
| is "fully regular". (Note that all of these prefixes are
| verbs, and with the exception of the first set, they inflect
| like verbs. The first set inflect only for past tense.)
|
| [Those prefixes, in order, indicate: modality (there are many
| verbs that can occupy this space, but they're all mutually
| exclusive with each other), perfect aspect (exactly one verb
| can occur here), continuous aspect (again, exactly one verb
| can occur here), passive voice (two verbs, mutually
| exclusive). Any of the sixteen theoretical combinations of
| these is allowed.]
|
| But that's all periphrasis. English verbal _inflection_ is a
| different phenomenon: each verb has five forms (except _be_ ,
| which is special and has more): plain ( _take_ ), past (
| _took_ ), third-person present singular ( _takes_ ), active
| participle ( _taking_ ), and passive participle ( _taken_ ).
| This is where a verb might be irregular or not. A regular
| verb's past form adds the suffix /d/ to the plain form, the
| active participle adds /ING/, the third-person present
| singular form adds /z/, and the passive participle is
| identical with the past form.
|
| All verbs are regular in their active participle and only
| _be_ is irregular in the third-person present singular.+ But
| verbs might have arbitrary past forms ( _go_ / _went_ ) and
| while the passive participles are never arbitrary, there is a
| set of irregular forms following a common pattern that
| descends from an earlier stage of the language. ( _taken_ /
| _been_ / _gone_ / _known_ / _gotten_ / ...).
|
| There is a double standard over what counts as irregularity
| in verbs. By the standard we use for Latin, English has a
| total of one irregular verb. A Latin verb may have a
| perfective stem that is arbitrarily different from its
| imperfective stem (compare _ferre_ with _tulisse_ ), but this
| "doesn't count" - we say that it's necessary to memorize four
| forms of any Latin verb, and with English verbs having only
| five forms and two of them being always regular, it's a real
| stretch to find verbs that require you to memorize more than
| four forms.
|
| + edit: this was wrong; _have_ is also irregular in the
| present third-person singular.
| niederman wrote:
| No, this supports only a highly regular subset of Japanese
| grammar. There are plenty of irregular phrases it doesn't
| cover, as in most languages.
| zerof1l wrote:
| Textbook one that is usually taught in the first couple of
| years - yes quite strict and can be deduced to logical
| patterns. But every-day spoken informal Japanese... no.
|
| And then there are a bunch of nuances that don't follow logic.
| You just need to learn about them. For example, word "you" is
| always considered informal, a bit intimate and a bit rude in
| spoken Japanese, it can be used in formal written Japanese.
| charcircuit wrote:
| >But every-day spoken informal Japanese... no.
|
| Even informal Japanese comes down to basic patterns for
| grammar.
|
| >For example, word ...
|
| Word choice is not grammar. Knowing how to use words
| correctly is the hard part.
| cynicalkane wrote:
| Well, one nice property of Japanese is the formal language is
| perfectly workable as a language. You'll sound a little
| stilted, but that's rather normal for foreigners.
|
| If you spoke English while only sticking to formal grammar
| rules, you'd be unable to communicate like a normal person at
| all.
|
| I think the comment on the word 'you' sounding informal is a
| misunderstanding. People use a common formal noun for 'you',
| anata, frequently; it's just the nature of Japanese
| emphasizes social relationships and makes it easy to drop
| references to people, so using such formal references can be
| stilted in the wrong context. But that's a matter of word
| choice, not grammar.
| sparkie wrote:
| Not strict, but there is underlying structure.
|
| It's not really practical, but it is interesting to compare it
| structurally to programming languages.
|
| A verb (or verbal adjective) basically forms the "root" of a
| clause, and typically appears at the end of it. If we related
| that to the concept of a function, it would be like writing
| them in postfix form, with whatever comes before it acting as
| the arguments to it. (args)func
| (WORDS)VERB
|
| Other parts of the clause are suffixed with a spoken
| "particle", which denotes its purpose in the sentence. These
| kind of resemble optional/named parameters/keyword arguments,
| where we can specify them out of canonical order - but the
| particle behaves like the named parameter to specify which it
| is, and they too appear in postfix position.
| (arg1:subject, arg2:object)VERB ;eg
| (NOUN:ga, NOUN:wo)VERB (NOUN:wo, NOUN:ga)VERB
|
| Both examples mean the same, but the former would be more
| typical, and the latter less common.
|
| A verb clause which appears before a noun modifies the noun,
| which bears similarity to a subexpression.
| (NOUN:wa, ((NOUN:ni)VERB NOUN):wo)VERB
|
| Some particles modify a sentence and come after the verb - for
| example, to make it a question it's followed by a particle like
| `ka`, `no`, `ne` or `na`. These might resembled keywords -
| again appearing in a posfix position.
| (NOUN:ga)VERB ka
|
| Compared with a typical programming language, Japanese is
| structured the opposite way - from right to left.
| pjc50 wrote:
| It's more like a stack language like FORTH.
| thaumasiotes wrote:
| > A verb (or verbal adjective) basically forms the "root" of
| a clause, and typically appears at the end of it.
|
| > Compared with a typical programming language, Japanese is
| structured the opposite way - from right to left.
|
| How consistent is that across different types of phrases?
|
| English is generally described as being strongly right-
| headed. But that's really a statement about English noun
| phrases. Prepositional phrases are left-headed. Determiner
| phrases, if that's your thing, are left-headed. Verbs are
| normally positioned between their subject and object, with
| other modifiers generally after the object.
|
| I had a Chinese tutor who complained that everything in
| Chinese was the other way around compared to English, but as
| far as I can tell the odds are about 50-50 that any given
| structure will match or reverse. Mandarin sentences are SVO
| just like English. Mandarin prepositions go after their nouns
| instead of before. Mandarin adjectives go before their nouns.
| Mandarin noun compounds work the same way as English noun
| compounds. (Ya Shua "tooth brush" is a brush, not a tooth.)
| Mandarin verbs have indirect objects before the verb instead
| of after. Mandarin discourse particles occur at the end of
| the sentence instead of the beginning.+ Mandarin relative
| clauses work just like English noun phrases, which is funny
| because English relative clauses have their own bespoke
| ordering. (English: _the man who has five sisters_ ;
| Mandarin: _has five sisters who [the] man_.)
|
| The big lesson I drew from this is "it doesn't make a lot of
| sense to describe languages overall as having a particular
| orientation". Is Japanese more consistent, or is orientation
| more sensitive to the specific structures you're using?
|
| + Note that "beginning" and "end" aren't the only options. In
| classical Latin and Greek, particles like this always appear
| as the second word in a sentence.
| sparkie wrote:
| Japanese is consistent about the verb (or adjective,
| adverb) coming last in the clause/sentence (besides certain
| sentence-ending particles). There are no "particles"
| necessary on verbs because they're always in the right
| position.
|
| The subject/object/means/destination/etc are flexible and
| can appear in any order - but typically subject appears
| before object - `NOUN ga NOUN wo VERB`, `NOUN ga NOUN ni
| VERB` (where `ga` denotes the subject), whereas `NOUN wo
| NOUN ga VERB` would be atypical, but not incorrect.
|
| When a verb (or adjective/adverb) appears before a noun, as
| in `VERB NOUN`, it modifies the noun - acting like an
| adjective - `VERB NOUN` is like "NOUN that/which VERBS".
| All adjectives work this way - ADJ NOUN, because verbs and
| adjectives are not really disjoint word classes in
| Japanese. Even the noun adjectives ("na-adjectives")
| function this way - they appear as `NOUN na NOUN`, but `na`
| is secretly a verb disguised as a particle. Examples:
| sora ga aoi (The sky is blue) aoi sora
| (Blue sky) [i-adjective] sora ga kirei desu
| (The sky is pretty) kirei na sora (Pretty
| sky) [na-adjective] sora ga haiiro ni naru (The
| sky turns grey) haiiro ni naru sora (Sky which
| turns gray) [verb clause as adjective]
|
| Spoken Japanese is very context sensitive, so things can be
| omitted if they're already known to the listener - this can
| include the verb, and any of the particles.
|
| Verbs can appear out of order - for example at the start of
| a sentence - but the meaning is understood based on tone or
| pause in the speech - basically, if what were being spoken
| were to be written it would be with a comma after the verb.
| `VERB, NOUN`
|
| Example: watashi wa hara ga hetta (I'm
| hungry). hara ga hetta, watashi (Hungry, I am).
| hara ga heru watashi (Me who is hungry) [no pause or
| comma]
| canjobear wrote:
| English is mixed head-initial and head-final, for example
| objects go after verbs (head-initial) but adjectives go
| before nouns (head-final). Japanese is strictly head-final.
| lewisjoe wrote:
| Can this be used to build a grammar checker for japanese
| language?
| charcircuit wrote:
| No, it's missing basic things, accepts ungramattical sentences,
| and is fundamentally flawed by being based off nihongokyouiku
| grammar.
| sparkie wrote:
| No. Japanese is very context sensitive, and like any natural
| language, has ambiguities. Japanese is loaded with _dajare_
| (puns).
|
| Grammar checking basically needs AI - you need to train some
| model to understand common phrases and sentence structure.
| Before LLMs there was software like MeCab[1] which done this,
| and gave good results, but modern LLMs are much more capable.
|
| [1]:https://taku910.github.io/mecab/
| koito17 wrote:
| Even if one could verify grammatical correctness, there are
| many ways to produce unnatural Japanese phrases.
|
| To give an easy example: 9tsu (9 things) is natural, but 10tsu
| sounds extremely strange. However, 10Ge sounds fine. When the
| number is large enough, it's also common to not use Zhu Shu Ci
| at all.
|
| Sometimes, grammatical mistakes are _natural_ Japanese. For
| instance, there is a concept of raBa kiYan Xie (words with ra
| dropped), where people will say e.g. Qin renai ( "I can't
| sleep") instead of Qin rarenai. This is an error in
| conjugation, yet it's natural language and applies to a few
| other words, too.
|
| Validating both grammar and word choice is still insufficient
| to judge naturality of a Japanese phrase. A common "mistake"
| made by many Japanese is writing [Wei He Gan woGan ziru] . The
| verb is redundant because of the [Gan ] in [Wei He Gan ] . The
| "correct" word to use in this case is Jue eru. In practice,
| however, either choice of word is understandable and considered
| correct (except to those with the trivia of [Wei He Gan haJue
| erumono!] )
|
| Sometimes, redundancy makes phrases considered incorrect (see
| Er Zhong Jing Yu for an example). In other cases, nobody will
| debate the correctness of the phrase.
| GolDDranks wrote:
| No, because what people mean when they say grammar checker, it
| doesn't suffice to check whether a sentence is (formally
| speaking) ungrammatical or not. You'd expect it also check word
| choice, ortography etc. Those aren't part of the syntax
| structure. This means that it would allow many very flawed
| sentences.
|
| Besides, the grammar this project uses is not likely to reflect
| accurately the actual grammar of modern spoken or written
| Japanese, and it's likely not to be even nearly complete; that
| would mean it would also have a quite lot of false positive
| "ungrammaticals".
|
| Something _like_ this can certainly be used as a part of a
| grammar checker. But in that case, you shouldn't implement it
| in TypesScript's type system in the first place.
| enricozb wrote:
| Somewhat related: the Lambek calculus[0], which is kind of a
| general type system for written languages. Words are either
| functions or concrete types, and functions can take their
| arguments on the left or right.
| guerrilla wrote:
| This reminds me of categorial grammars. Has anyone here ever
| looked into them? I loved them. They're like simply typed lambda
| calculi but instead of a single arrow type there are two, one for
| the left and right side of a symbol. So you can so like:
| and : Phrase \ Conjunction / Phrase do : Subject \ Verb /
| Object
|
| Sorry if I have the directionality wrong but `and` would take a
| phrase on the left, another on the right abd return a
| conjunction. `do` would take an Subject and an Object and return
| a verb. In STLC it would look like this: and :
| Phrase -> Phrase -> Conjunction do : Subject -> Object ->
| Verb
|
| These are bad examples just for illustrations, consult real
| linguists.
|
| Last time I looked at it they were adding polymorphism and a
| couple of people were starting to think about dependent types. It
| was mostly linguists interested in it but it was hardcore math
| and CS. Can't remember the names involved. Carl H. I forget what
| the H stands for. Hmm. Some fanous type theorists had written
| about them too.
|
| Anyway, seemed relevant.
| canjobear wrote:
| Lots of work on this kind of thing.
|
| A good starting point is
| https://en.wikipedia.org/wiki/Combinatory_categorial_grammar
___________________________________________________________________
(page generated 2025-03-29 23:00 UTC)