[HN Gopher] Reading Akkadian cuneiform using natural language pr...
       ___________________________________________________________________
        
       Reading Akkadian cuneiform using natural language processing (2020)
        
       Author : Bluestein
       Score  : 86 points
       Date   : 2024-08-12 22:52 UTC (3 days ago)
        
 (HTM) web link (journals.plos.org)
 (TXT) w3m dump (journals.plos.org)
        
       | gumby wrote:
       | Dupe: https://news.ycombinator.com/item?id=41229597
        
       | Bluestein wrote:
       | Find this application exciting. Using natural language processing
       | to crack Akkadian cuneiform is like equipping a historian with a
       | high-speed translation engine. What once took scholars
       | painstaking years of decoding complex logograms and syntax now
       | gets a digital boost, with NLP stepping in and saying, "Let's
       | breeze through these ancient texts like it's a weekend crossword
       | puzzle."
        
       | sungam wrote:
       | I'm sure that all of the ancient scripts will be fed into an LLM
       | or equivalent eventually - it will be fascinating to see if this
       | will give new insights into the evolution of language and culture
       | - perhaps even allowing understanding of undeciphered scripts
       | such as Linear A
        
         | Bluestein wrote:
         | Understanding Linear A would be revolutionary.-
        
           | IncreasePosts wrote:
           | It would certainly be newsworthy, but I'm not sure about
           | revolutionary. We only have essentially a few pages worth of
           | linear A, and who knows what's written with it.
           | 
           | Undoubtedly there are tablets and scripts that have been
           | sitting unread in an archive somewhere, in a deciphered
           | language, that we just haven't gotten around to analyzing
           | yet.
        
             | mithametacs wrote:
             | And even if we understood it, it's still a tiny amount of
             | text.
             | 
             | My educated guess is it's going to be something like:
             | Zartkir is an asshat
        
               | pjmlp wrote:
               | Given some of the stuff in Pomppei, most likely.
        
               | Bluestein wrote:
               | It'd be funny if it turned out to be an epic style
               | rickroll
               | 
               | "For many moons and many suns have passed, as time,
               | relentless, flows, Since first their hearts, like twin
               | flames, danced in love's eternal fire.
               | 
               | Yet, the hero, of noble heart and steadfast will, did
               | speak, "My heart yearns for a bond unbroken, a pledge
               | that none may sever. No fleeting fancy, but a troth
               | eternal, one that the gods themselves would envy. For
               | such a vow, none but I could offer, no other man could
               | dare."
               | 
               | With eyes like the stars in the firmament, he gazed into
               | the soul of his beloved, And spake thus, with the voice
               | of thunder, yet with the gentleness of the west wind:
               | "Hear me, O cherished one, for I must reveal the depths
               | of my heart. Never shall I forsake thee, never shall I
               | let thee fall, I shall not stray into the shadows nor
               | abandon thee in the wilderness, Never shall I bring
               | sorrow to thine eyes, nor bid thee farewell. Never shall
               | my lips craft falsehoods that wound like the sharpest
               | spear."
               | 
               | Through the ages, their bond had grown strong as the oak,
               | Though the maiden's heart did ache with a silent pain,
               | For fear held her tongue captive, her love hidden in the
               | shadows. But the hero, wise as the elders, knew the truth
               | of their shared plight, For both had danced the dance of
               | fate, and knew well the game they played.
               | 
               | Then again, he spake, his voice a balm to the troubled
               | heart: "Do not be blind to what lies before us, for the
               | path is clear, And if thou wouldst ask of my heart's
               | burden, know this: I shall not waver, nor falter in my
               | love for thee. Never shall I forsake thee, never shall I
               | let thee fall, I shall not stray into the shadows nor
               | abandon thee in the wilderness, Never shall I bring
               | sorrow to thine eyes, nor bid thee farewell. Never shall
               | my lips craft falsehoods that wound like the sharpest
               | spear."
               | 
               | Through the years they had known each other, their souls
               | intertwined, Her heart had borne the weight of unspoken
               | love, Yet now, the veil lifted, they stood as one, United
               | in purpose, ready to face the trials of love's enduring
               | quest. And thus, the hero pledged once more, with a heart
               | pure and true, "Never shall I forsake thee, never shall I
               | let thee fall, I shall not stray into the shadows nor
               | abandon thee in the wilderness, Never shall I bring
               | sorrow to thine eyes, nor bid thee farewell. Never shall
               | my lips craft falsehoods that wound like the sharpest
               | spear."
               | 
               | So spake the hero, and the heavens themselves bore
               | witness, As love, unyielding, forged a bond that time
               | could not erode. And thus they walked, hand in hand, into
               | the dawn of a new age, Their hearts as one, forever bound
               | by the sacred oath."
        
             | Bluestein wrote:
             | > language, that we just haven't gotten around to analyzing
             | yet.
             | 
             | Indeed. Thanks for putting things in their right, measured,
             | context.-
             | 
             | I guess thid is what I was getting at: How, if deciphered,
             | the _process_ itself would be extraordinary in its
             | widespread application in other cases.-
        
         | bhhaskin wrote:
         | I know someone who is working on a general LLM to bring back
         | dead languages. They have had great success using old biblical
         | manuscripts and the LLM picks up on the syntax and grammar.
        
         | jcranmer wrote:
         | Linear A is the undeciphered script with the largest
         | repertoire, about 8000 characters in total, and the largest
         | single fragment is like 300 characters long. It's like trying
         | to understand English given random clippings of signs and maybe
         | a whole sentence from a book that in total amount to 5 or 6
         | pages of text. And we have a head-start--we make the educated
         | guess of Linear A's phonetic values by comparing to Linear B,
         | which was derived from it. The situation for the other
         | undeciphered scripts is even worse.
         | 
         | It's hard to see how LLMs can help here because the problem
         | with understanding these scripts is we just lack enough data to
         | make any conclusions. And LLMs are famously reliant on being
         | very data-hungry.
        
           | theendisney wrote:
           | Different languages have similar words, people make similar
           | expressions and do similar things. I think everything to
           | solve the puzzle is there its just impossibly hard.
           | 
           | But who knows, there might be embarresingly obvious things we
           | just didnt notice.
           | 
           | At least we will get funny halucinations of the quality of
           | Hindu Rongorongo. lol
           | 
           | Cuneiform does strike me as potentially the most awesome
           | application of AI. Very exciting. (Second only to talking
           | with animals.)
        
             | adastra22 wrote:
             | The spoken language corresponding to Linear A is probably
             | unrelated to the spoken language (Greek) represented by
             | Linear B. If Linear A represents a spoken language--and
             | this is by no means certain--then it is the native language
             | of the Minoans, which has long since been lost. So
             | unfortunately we don't have much to go on.
        
           | ggm wrote:
           | https://www.classics.cam.ac.uk/research/projects/mycep/decip.
           | ..
           | 
           | Fascinating, sad story of Linear B. Ventris died early. The
           | book from 1958 is well worth reading
           | 
           | https://ia600401.us.archive.org/34/items/ChadwickJohnTheDeci.
           | ..
        
           | DonaldFisk wrote:
           | Rongorongo has roughly double that number of glyphs. The
           | descendant (Rapa Nui) of the language it encodes is still
           | spoken, and there's a whole family of related languages. But
           | we haven't made much headway in deciphering it. Linear A has
           | one advantage though: we have some understanding of how the
           | script works, and can (or think we can) pronounce parts of
           | the text. There's also Etruscan, which is partially
           | deciphered (about 250 words are understood with any
           | certainty), but it has no surviving relatives and only a
           | couple of bilingual texts, one very short and the other not a
           | literal translation of the known language. So all we have to
           | go on is textual and archaeological context.
        
       | not2b wrote:
       | Akkademia is a great name.
        
         | mdp2021 wrote:
         | It seems like a lousy name - a monstrosity -, confusing
         | unrelated terms (greek Academos and the city of Agad, of
         | unclear origin).
        
           | flobosg wrote:
           | > _Television? The word is half Greek and half Latin. No good
           | will come of this device._
           | 
           | --C. P. Scott
        
             | xanderlewis wrote:
             | 'Homosexuality' is similarly an abomination: first part's
             | Greek; second part's Latin.
        
               | mdp2021 wrote:
               | Condemned through eterogeny. Very remarkable circle.
        
             | Terr_ wrote:
             | > half Greek and half Latin
             | 
             | For your amusement: What a physics textbook might look like
             | without borrowing words from French, Greek, or Latin. [0]
             | Such as isotopes that are unstable:
             | 
             | > Most samesteads of every firststuff are unabiding. Their
             | kernels break up, each at its own speed. This speed is
             | written as the half-life, which is how long it takes half
             | of any deal of the samestead thus to shift itself. The
             | doing is known as lightrotting. It may happen fast or
             | slowly, and in any of sundry ways, offhanging on the makeup
             | of the kernel. A kernel may spit out two firstbits with two
             | neitherbits, that is, a sunstuff kernel, thus leaping two
             | steads back in the roundaround board and four weights back
             | in heaviness. It may give off a bernstonebit from a
             | neitherbit, which thereby becomes a firstbit and thrusts
             | the uncleft one stead up in the board while keeping the
             | same weight. It may give off a forwardbit, which is a mote
             | with the same weight as a bernstonebit but a forward
             | lading, and thereby spring one stead down in the board
             | while keeping the same weight. Often, too, a mote is given
             | off with neither lading nor heaviness, called the
             | weeneitherbit. In much lightrotting, a mote of light with
             | most short wavelength comes out as well.
             | 
             | [0] https://en.wikipedia.org/wiki/Uncleftish_Beholding
        
           | not2b wrote:
           | We are speaking a language that is a similar monstrosity,
           | borrowing both words and structure from so many other
           | languages, pronouncing the vowels differently from everyone
           | else because of the Great Vowel Shift, never having a
           | spelling reform so you have to know where a word came from to
           | know how to spell it.
           | 
           | So I am cool with weird hybrid words that mash together
           | concepts from multiple languages to make puns. I speak the
           | wrong language to be a purist.
        
             | mdp2021 wrote:
             | No, there are better ways and worse ways to use more
             | distant languages for the development of this one: this
             | Language is very far from being something that could be
             | reduced to cheap puns.
             | 
             | <<Hybrid words that mash together concepts from multiple
             | languages>> - which is very fine - are not necessarily
             | cheap puns (which the superficial similarity of sounds
             | easily engenders).
             | 
             | But anyway, that a coined word suffer from transplant
             | rejection is "not the end of the world" - provided it
             | remains a proper name for a specific item, a brand.
        
       | lumberaisle wrote:
       | This is potentially a tool for Luddites to carry out the
       | operation of a digital printing-press.
        
       | sinuhe69 wrote:
       | I was confused at first. They wrote, "In this paper we present a
       | new method for automatic transliteration and segmentation of
       | Unicode cuneiform glyphs using Natural Language Processing (NLP)
       | techniques". So my reaction was, "huh, why did they need to do
       | any segmentation and transliteration with Unicode?". Turns out,
       | they were doing segmentation and then transliteration from the
       | image of cuneiform tablets to Unicode (cuneiform)
       | glyphs/characters.
       | 
       | Basically an OCR for cuneiform :)
        
         | singularity2001 wrote:
         | this is not what they did. They took unicode and separated it
         | into words. Which seems silly since the online archives are
         | already separated but I guess it's useful for new texts where
         | the word boundaries are not clear
        
       | tonetegeatinst wrote:
       | I can just imagine somewhere their is a guy trying to twist this
       | into a neovim plugin so our fellow neovimmers can use ceunaiform
       | for programming and note taking.
        
       | ranger_danger wrote:
       | So what does the tablet actually say?
        
         | gavmor wrote:
         | From one figure:
         | 
         | > These are lines 31-34 of the second column of Sennacherib's
         | clay prism, probably from Nineveh, now in the Israel Museum
         | (IMJ 71.072.0249). The text records eight campaigns of the
         | Assyrian King, including the siege of Jerusalem which is well
         | known from the Book of Kings. The line reads: 'On my return
         | march, I received a heavy tribute from the distant Medes, of
         | whose land none of the kings, my ancestors, had heard mention.'
         | (translation adapted from A.K. Grayson and J. Novotny's edition
         | available on ORACC, Q003497).
         | 
         | Here's a full translation (without the OP's technique):
         | https://www.kchanson.com/ANCDOCS/meso/sennprism1.html
        
       | janandonly wrote:
       | How did I not know about this 2020 paper before? I wrote a
       | blogpost about using chatgtp for translating Akkadian text in
       | 2023 but was unaware of this research.
       | 
       | At the time HN was stoked about that post too.
       | 
       | Link: https://www.janromme.com/2023/05/ChaptGPT-transaltion-of-
       | Akk...
        
       ___________________________________________________________________
       (page generated 2024-08-15 23:02 UTC)