[HN Gopher] Reading Akkadian cuneiform using natural language pr...
___________________________________________________________________
Reading Akkadian cuneiform using natural language processing (2020)
Author : Bluestein
Score : 86 points
Date : 2024-08-12 22:52 UTC (3 days ago)
(HTM) web link (journals.plos.org)
(TXT) w3m dump (journals.plos.org)
| gumby wrote:
| Dupe: https://news.ycombinator.com/item?id=41229597
| Bluestein wrote:
| Find this application exciting. Using natural language processing
| to crack Akkadian cuneiform is like equipping a historian with a
| high-speed translation engine. What once took scholars
| painstaking years of decoding complex logograms and syntax now
| gets a digital boost, with NLP stepping in and saying, "Let's
| breeze through these ancient texts like it's a weekend crossword
| puzzle."
| sungam wrote:
| I'm sure that all of the ancient scripts will be fed into an LLM
| or equivalent eventually - it will be fascinating to see if this
| will give new insights into the evolution of language and culture
| - perhaps even allowing understanding of undeciphered scripts
| such as Linear A
| Bluestein wrote:
| Understanding Linear A would be revolutionary.-
| IncreasePosts wrote:
| It would certainly be newsworthy, but I'm not sure about
| revolutionary. We only have essentially a few pages worth of
| linear A, and who knows what's written with it.
|
| Undoubtedly there are tablets and scripts that have been
| sitting unread in an archive somewhere, in a deciphered
| language, that we just haven't gotten around to analyzing
| yet.
| mithametacs wrote:
| And even if we understood it, it's still a tiny amount of
| text.
|
| My educated guess is it's going to be something like:
| Zartkir is an asshat
| pjmlp wrote:
| Given some of the stuff in Pomppei, most likely.
| Bluestein wrote:
| It'd be funny if it turned out to be an epic style
| rickroll
|
| "For many moons and many suns have passed, as time,
| relentless, flows, Since first their hearts, like twin
| flames, danced in love's eternal fire.
|
| Yet, the hero, of noble heart and steadfast will, did
| speak, "My heart yearns for a bond unbroken, a pledge
| that none may sever. No fleeting fancy, but a troth
| eternal, one that the gods themselves would envy. For
| such a vow, none but I could offer, no other man could
| dare."
|
| With eyes like the stars in the firmament, he gazed into
| the soul of his beloved, And spake thus, with the voice
| of thunder, yet with the gentleness of the west wind:
| "Hear me, O cherished one, for I must reveal the depths
| of my heart. Never shall I forsake thee, never shall I
| let thee fall, I shall not stray into the shadows nor
| abandon thee in the wilderness, Never shall I bring
| sorrow to thine eyes, nor bid thee farewell. Never shall
| my lips craft falsehoods that wound like the sharpest
| spear."
|
| Through the ages, their bond had grown strong as the oak,
| Though the maiden's heart did ache with a silent pain,
| For fear held her tongue captive, her love hidden in the
| shadows. But the hero, wise as the elders, knew the truth
| of their shared plight, For both had danced the dance of
| fate, and knew well the game they played.
|
| Then again, he spake, his voice a balm to the troubled
| heart: "Do not be blind to what lies before us, for the
| path is clear, And if thou wouldst ask of my heart's
| burden, know this: I shall not waver, nor falter in my
| love for thee. Never shall I forsake thee, never shall I
| let thee fall, I shall not stray into the shadows nor
| abandon thee in the wilderness, Never shall I bring
| sorrow to thine eyes, nor bid thee farewell. Never shall
| my lips craft falsehoods that wound like the sharpest
| spear."
|
| Through the years they had known each other, their souls
| intertwined, Her heart had borne the weight of unspoken
| love, Yet now, the veil lifted, they stood as one, United
| in purpose, ready to face the trials of love's enduring
| quest. And thus, the hero pledged once more, with a heart
| pure and true, "Never shall I forsake thee, never shall I
| let thee fall, I shall not stray into the shadows nor
| abandon thee in the wilderness, Never shall I bring
| sorrow to thine eyes, nor bid thee farewell. Never shall
| my lips craft falsehoods that wound like the sharpest
| spear."
|
| So spake the hero, and the heavens themselves bore
| witness, As love, unyielding, forged a bond that time
| could not erode. And thus they walked, hand in hand, into
| the dawn of a new age, Their hearts as one, forever bound
| by the sacred oath."
| Bluestein wrote:
| > language, that we just haven't gotten around to analyzing
| yet.
|
| Indeed. Thanks for putting things in their right, measured,
| context.-
|
| I guess thid is what I was getting at: How, if deciphered,
| the _process_ itself would be extraordinary in its
| widespread application in other cases.-
| bhhaskin wrote:
| I know someone who is working on a general LLM to bring back
| dead languages. They have had great success using old biblical
| manuscripts and the LLM picks up on the syntax and grammar.
| jcranmer wrote:
| Linear A is the undeciphered script with the largest
| repertoire, about 8000 characters in total, and the largest
| single fragment is like 300 characters long. It's like trying
| to understand English given random clippings of signs and maybe
| a whole sentence from a book that in total amount to 5 or 6
| pages of text. And we have a head-start--we make the educated
| guess of Linear A's phonetic values by comparing to Linear B,
| which was derived from it. The situation for the other
| undeciphered scripts is even worse.
|
| It's hard to see how LLMs can help here because the problem
| with understanding these scripts is we just lack enough data to
| make any conclusions. And LLMs are famously reliant on being
| very data-hungry.
| theendisney wrote:
| Different languages have similar words, people make similar
| expressions and do similar things. I think everything to
| solve the puzzle is there its just impossibly hard.
|
| But who knows, there might be embarresingly obvious things we
| just didnt notice.
|
| At least we will get funny halucinations of the quality of
| Hindu Rongorongo. lol
|
| Cuneiform does strike me as potentially the most awesome
| application of AI. Very exciting. (Second only to talking
| with animals.)
| adastra22 wrote:
| The spoken language corresponding to Linear A is probably
| unrelated to the spoken language (Greek) represented by
| Linear B. If Linear A represents a spoken language--and
| this is by no means certain--then it is the native language
| of the Minoans, which has long since been lost. So
| unfortunately we don't have much to go on.
| ggm wrote:
| https://www.classics.cam.ac.uk/research/projects/mycep/decip.
| ..
|
| Fascinating, sad story of Linear B. Ventris died early. The
| book from 1958 is well worth reading
|
| https://ia600401.us.archive.org/34/items/ChadwickJohnTheDeci.
| ..
| DonaldFisk wrote:
| Rongorongo has roughly double that number of glyphs. The
| descendant (Rapa Nui) of the language it encodes is still
| spoken, and there's a whole family of related languages. But
| we haven't made much headway in deciphering it. Linear A has
| one advantage though: we have some understanding of how the
| script works, and can (or think we can) pronounce parts of
| the text. There's also Etruscan, which is partially
| deciphered (about 250 words are understood with any
| certainty), but it has no surviving relatives and only a
| couple of bilingual texts, one very short and the other not a
| literal translation of the known language. So all we have to
| go on is textual and archaeological context.
| not2b wrote:
| Akkademia is a great name.
| mdp2021 wrote:
| It seems like a lousy name - a monstrosity -, confusing
| unrelated terms (greek Academos and the city of Agad, of
| unclear origin).
| flobosg wrote:
| > _Television? The word is half Greek and half Latin. No good
| will come of this device._
|
| --C. P. Scott
| xanderlewis wrote:
| 'Homosexuality' is similarly an abomination: first part's
| Greek; second part's Latin.
| mdp2021 wrote:
| Condemned through eterogeny. Very remarkable circle.
| Terr_ wrote:
| > half Greek and half Latin
|
| For your amusement: What a physics textbook might look like
| without borrowing words from French, Greek, or Latin. [0]
| Such as isotopes that are unstable:
|
| > Most samesteads of every firststuff are unabiding. Their
| kernels break up, each at its own speed. This speed is
| written as the half-life, which is how long it takes half
| of any deal of the samestead thus to shift itself. The
| doing is known as lightrotting. It may happen fast or
| slowly, and in any of sundry ways, offhanging on the makeup
| of the kernel. A kernel may spit out two firstbits with two
| neitherbits, that is, a sunstuff kernel, thus leaping two
| steads back in the roundaround board and four weights back
| in heaviness. It may give off a bernstonebit from a
| neitherbit, which thereby becomes a firstbit and thrusts
| the uncleft one stead up in the board while keeping the
| same weight. It may give off a forwardbit, which is a mote
| with the same weight as a bernstonebit but a forward
| lading, and thereby spring one stead down in the board
| while keeping the same weight. Often, too, a mote is given
| off with neither lading nor heaviness, called the
| weeneitherbit. In much lightrotting, a mote of light with
| most short wavelength comes out as well.
|
| [0] https://en.wikipedia.org/wiki/Uncleftish_Beholding
| not2b wrote:
| We are speaking a language that is a similar monstrosity,
| borrowing both words and structure from so many other
| languages, pronouncing the vowels differently from everyone
| else because of the Great Vowel Shift, never having a
| spelling reform so you have to know where a word came from to
| know how to spell it.
|
| So I am cool with weird hybrid words that mash together
| concepts from multiple languages to make puns. I speak the
| wrong language to be a purist.
| mdp2021 wrote:
| No, there are better ways and worse ways to use more
| distant languages for the development of this one: this
| Language is very far from being something that could be
| reduced to cheap puns.
|
| <<Hybrid words that mash together concepts from multiple
| languages>> - which is very fine - are not necessarily
| cheap puns (which the superficial similarity of sounds
| easily engenders).
|
| But anyway, that a coined word suffer from transplant
| rejection is "not the end of the world" - provided it
| remains a proper name for a specific item, a brand.
| lumberaisle wrote:
| This is potentially a tool for Luddites to carry out the
| operation of a digital printing-press.
| sinuhe69 wrote:
| I was confused at first. They wrote, "In this paper we present a
| new method for automatic transliteration and segmentation of
| Unicode cuneiform glyphs using Natural Language Processing (NLP)
| techniques". So my reaction was, "huh, why did they need to do
| any segmentation and transliteration with Unicode?". Turns out,
| they were doing segmentation and then transliteration from the
| image of cuneiform tablets to Unicode (cuneiform)
| glyphs/characters.
|
| Basically an OCR for cuneiform :)
| singularity2001 wrote:
| this is not what they did. They took unicode and separated it
| into words. Which seems silly since the online archives are
| already separated but I guess it's useful for new texts where
| the word boundaries are not clear
| tonetegeatinst wrote:
| I can just imagine somewhere their is a guy trying to twist this
| into a neovim plugin so our fellow neovimmers can use ceunaiform
| for programming and note taking.
| ranger_danger wrote:
| So what does the tablet actually say?
| gavmor wrote:
| From one figure:
|
| > These are lines 31-34 of the second column of Sennacherib's
| clay prism, probably from Nineveh, now in the Israel Museum
| (IMJ 71.072.0249). The text records eight campaigns of the
| Assyrian King, including the siege of Jerusalem which is well
| known from the Book of Kings. The line reads: 'On my return
| march, I received a heavy tribute from the distant Medes, of
| whose land none of the kings, my ancestors, had heard mention.'
| (translation adapted from A.K. Grayson and J. Novotny's edition
| available on ORACC, Q003497).
|
| Here's a full translation (without the OP's technique):
| https://www.kchanson.com/ANCDOCS/meso/sennprism1.html
| janandonly wrote:
| How did I not know about this 2020 paper before? I wrote a
| blogpost about using chatgtp for translating Akkadian text in
| 2023 but was unaware of this research.
|
| At the time HN was stoked about that post too.
|
| Link: https://www.janromme.com/2023/05/ChaptGPT-transaltion-of-
| Akk...
___________________________________________________________________
(page generated 2024-08-15 23:02 UTC)