[HN Gopher] How AI is unlocking ancient texts
___________________________________________________________________
How AI is unlocking ancient texts
Author : Marceltan
Score : 195 points
Date : 2024-12-30 13:35 UTC (3 days ago)
(HTM) web link (www.nature.com)
(TXT) w3m dump (www.nature.com)
| aaronbrethorst wrote:
| (2024)
| Tagbert wrote:
| :-)
| datavirtue wrote:
| Nothing to see here. LLMs and AI suck and aren't really good at
| anything. /s
|
| The world is about to change much faster than any of us have ever
| witnessed to this point. What a life.
| muglug wrote:
| There's a big difference between LLMs and this application of
| CNNs and RNNs.
|
| Very few people on HN are claiming there's no value to neural
| networks -- CNNs have been heralded here for well over a
| decade.
| mcphage wrote:
| There are definitely things they're good at. And there's
| definitely things that they're bad at, worse than nothing at
| all. The problem is how often they're being used in the later
| case, and how rarely in the former.
| zeofig wrote:
| Build a strawman, knock him down, and plant the glorious flag
| of hyperbole on his strawwy corpse.
| mlepath wrote:
| This is a great application of various domains of ML. This
| reminds me of Vesuvius Challenge. This kid of thing is accessible
| to beginners too since the data by definition are pretty
| limitted.
| jhanschoo wrote:
| Perhaps you missed it while skimming, but indeed, the Vesuvius
| Challenge is a primary topic of discussion in the article :)
| adriand wrote:
| I find this incredibly exciting. There could be some truly
| remarkable works whose contents are about to be revealed, and we
| don't really know what we might find. Histories of the ancient
| (more ancient) world. Accounts of contact with cultures and
| civilizations that are currently lost to history. Scientific and
| mathematical discoveries. And what I often find to be the most
| moving: stories of daily life that illuminate what regular people
| thought and felt and experienced thousands of years ago.
| Applejinx wrote:
| Which becomes a real gotcha when it turns out to be
| hallucinated 'content' misleading people into following their
| assumptions on what regular people thought and felt and
| experienced thousands of years ago.
|
| What we call AI does have superhuman powers but they are not
| powers of insight, they are powers of generalization. AI is
| more capable than a human is of homogenizing experience down to
| what a current snapshot of 'human thought' would be, because
| it's by definition PEOPLE rather than 'person'. The effort to
| invoke a specific perspective from it (that seems ubiquitous)
| sees AI at its worst. This idea that you could use it to
| correctly extract a specific perspective from the long dead, is
| wildly, wildly misguided.
| Sparkyte wrote:
| Can't wait to read ancient smutt from the time.
| sapphicsnail wrote:
| I wouldn't call in smut but there are 5 surviving Greek novels
| and some Roman elegaic poetry that's a little horny. We know
| there used to be a lot of crazier stuff but it mostly doesn't
| survive.
| mmooss wrote:
| There isn't much about accuracy:
|
| _" Ithaca restored artificially produced gaps in ancient texts
| with 62% accuracy, compared with 25% for human experts. But
| experts aided by Ithaca's suggestions had the best results of
| all, filling gaps with an accuracy of 72%. Ithaca also identified
| the geographical origins of inscriptions with 71% accuracy, and
| dated them to within 30 years of accepted estimates."_
|
| and
|
| _" [Using] an RNN to restore missing text from a series of 1,100
| Mycenaean tablets ... written in a script called Linear B in the
| second millennium bc. In tests with artificially produced gaps,
| the model's top ten predictions included the correct answer 72%
| of the time, and in real-world cases it often matched the
| suggestions of human specialists."_
|
| Obviously 62%, 72%, 72% in ten tries, etc. is not sufficient by
| itself. How do scholars use these tools? Without some external
| source to verify the truth, you can't know if the software output
| is accurate. And if you have some reliable external source, you
| don't need the software.
|
| Obviously, they've thought of that, and it's worth experimenting
| with these powerful tools. But I wonder how they've solved that
| problem.
| sapphicsnail wrote:
| > Obviously 62%, 72%, 72% in ten tries, etc. is not sufficient
| by itself. How do scholars use these tools? Without some
| external source to verify the truth, you can't know if the
| software output is accurate. And if you have some reliable
| external source, you don't need the software.
|
| Without an extant text to compare, everything would be a guess.
| Maybe this would be helpful if you're trying to get a rough and
| dirty translation of a bunch of papyri or inscriptions? Until
| we have an AI that's able to adequately explain it's reasoning
| I can't see this replacing philologists with domain-specific
| expertise who are able to walk you through the choices they
| made.
| EA-3167 wrote:
| I wonder if maybe the goal is to provide the actual scholars
| with options, approaches or translations they hadn't thought
| of yet. In essence just what you said, structured guessing,
| but if you can have a well-trained bot guess within specific
| bounds countless times and output the patterns in the
| guesses, maybe it would be enough. Not, "My AI translated
| this ancient fragment of text," but "My AI sent us in a
| direction we hadn't previously had the time or inclination to
| explore, which turned out to be fruitful."
| mmooss wrote:
| I agree, but lets remember that the software repeats
| patterns, it doesn't so much innovate new ones. If you get
| too dependent on it, theoretically you might not break as
| much new ground, find new paradigms, discover the long-
| mistaken assumption in prior scholarship (that the software
| is repeating), etc.
| Validark wrote:
| Interesting point in theory but I'd love to get to the
| point where our problem is that we solved all the
| problems we already know how to solve.
| Zancarius wrote:
| Human proclivities tend toward repetition as well,
| partially as a memory/mnemonic device, so I don't see
| this as disadvantageous. For example, there's a minor
| opinion in biblical scholarship that John 21 was a later
| scribal addition because of the end of John 20 seeming to
| mark the end of the book itself. However, John's
| tendencies to use specific verbiage and structure
| provides a much stronger argument that the book was
| written by the same author--including chapter 21--
| suggesting that the last chapter is an epilogue.
|
| Care needs to be taken, of course, but ancient works
| often followed certain patterns or linguistic choices
| that could be used to identify authorship. As long as
| this is viewed as one tool of many, there's unlikely much
| harm unless scholars lean too heavily on the opinions of
| AI analysis (which is the real risk, IMO).
| manquer wrote:
| If the texts are truly missing , then accuracy is subjective ?
| i.e. human opinion versus AI generation
| ip26 wrote:
| _artificially produced gaps in ancient texts_
|
| Someone deleted part of a known text.
|
| This does require the AI hasn't been trained on the test text
| previously..
| rtkwe wrote:
| They do mention that the missing data test was done on
| "new" data that the models were not viewed trained on in
| the article so it's not just regurgitation for at least
| some of the results it seems.
| mmooss wrote:
| > If the texts are truly missing , then accuracy is
| subjective ?
|
| Then accuracy might be unknown but it's not subjective.
| BeefWellington wrote:
| One way to test this kind of efficacy is to compare it to a
| known sample with a missing piece, e.g.: create an artifact
| with known text, destroy it in similar fashion, compare what
| this model suggests as outputs with the real known text.
|
| The "known" sample would need to be handled and controlled
| for by an independent trusted party, obviously, and therein
| lies the problem: It will be hard to properly configure an
| experiment and _believe it_ if any of the parties have any
| kind of vested interest in the success of the project.
| rnd0 wrote:
| Thank you, and also I'd like to know how they'd even evaluate
| the results to begin with...
|
| I hope to GOD they're holding on to the originals so they can
| go back and redo this in 20,30 years when tools have improved.
| zozbot234 wrote:
| The really nice thing about this is that the AI can now acquire
| these newly-decoded texts as part of its training set, and begin
| learning at a geometric rate.
| mistrial9 wrote:
| errors in => errors out
| WhereIsTheTruth wrote:
| Don't forget to spice it up with some bias!
|
| https://x.com/i/grok/share/uMwJwGkl2XVUep0N4ZPV1QUx6
| rzzzt wrote:
| But do I want to see ancient programming advice written in
| Linear B?
| zeofig wrote:
| Why not just feed it random data? It's so smart that it will
| figure out which parts are random, so eventually you will
| generate some good data randomly, and it will feed on it, and
| become exponentially smarter exponentially fast.
| Validark wrote:
| This is actually hilarious and I'm sad you are getting
| downvoted for it.
| nitwit005 wrote:
| With our current methods, feeding back even fairly small
| amounts of outputs back in as training data leads to declining
| performance.
|
| Just think of it abstractly. The AI will be trained on the
| errors the previous generation made. As long as it keeps making
| new errors each generation, they will tend to multiply.
| red75prime wrote:
| Degradation of autoregressive models being fed their own
| unfiltered output is pretty obvious: it's, basically, noise
| being injected into the ground truth probability
| distribution.
|
| But. "Our current methods" include reinforcement learning. So
| long as there's a signal indicating better solutions,
| performance tends to improve.
| taffronaut wrote:
| From TFA "decoding rare and lost languages of which hardly any
| traces survive". Assuming that's not hype, let's see it have a go
| at Rongorongo[1] then.
|
| [1] https://en.m.wikipedia.org/wiki/Rongorongo
| nick238 wrote:
| and Linear A [1]. To be fair, whatever model would require data
| about the context of where the texts were found unless the
| corpus is _massive_.
|
| https://en.wikipedia.org/wiki/Linear_A
| shellfishgene wrote:
| It's mentioned in the article that they hope the model for
| Linear B can also help with Linear A.
| yzydserd wrote:
| Klaatu barada nikto
| userbinator wrote:
| The full title is "How AI is unlocking ancient texts -- _and
| could rewrite history_ ", and that second part is especially
| fitting, although unfortunately not mentioned in the article
| itself, which is full of rather horrifying stories about using AI
| to "fill in" missing data, which is clearly not true data
| recovery in any meaningful sense.
|
| I am aware of how advanced algorithms such as those used for
| flash memory today can "recover" data from imperfect probability
| distributions naturally created by NAND flash operation, but
| there seems to a huge gap between those, which are based on well-
| understood information-theoretic principles, and the AI
| techniques described here.
| mistrial9 wrote:
| related -- A Chinese woman, known by the pseudonym Zhemao, was
| found to have been rewriting and falsifying Russian history on
| Wikipedia for over a decade. She created over 200 detailed
| articles on the Chinese Wikipedia, which included fictitious
| states, battles, and aristocrats. The hoax was exposed when a
| Chinese novelist, Yifan, researching for a book, stumbled upon
| an article about the Kashin silver mine. Yifan noticed that the
| article contained extensive details that were not supported by
| other language versions of Wikipedia, including the Russian
| one.
|
| Zhemao, posing as a scholar, claimed to have a Ph.D. in world
| history from Moscow State University and was the daughter of a
| Chinese diplomat based in Russia. She used machine translation
| to understand Russian-language sources and filled in gaps with
| her own imagination. Her articles were well-crafted and
| included detailed references, making them appear credible.
| However, many of the sources cited were either fake or did not
| exist.
|
| The articles were eventually investigated by Wikipedia editors
| who found that Zhemao had used multiple "puppet accounts" to
| lend credibility to her edits. Following the investigation,
| Zhemao was banned from Chinese Wikipedia, and her edits were
| deleted.
| fuzztester wrote:
| >How AI is
|
| I almost read it as "How Ali is" due to speed reading and the
| font in the original article. :)
|
| And now I wonder how AI would do on that same test :)
|
| Chat, GPT!
| cormorant wrote:
| Does anyone have a subscription or can otherwise read past the
| heading "A flood of information"? (I can see ~2500 words but
| there is apparently more.)
| msephton wrote:
| The easiest way is to perpend the URL with archive.is, as
| follows:
| https://archive.is/https://www.nature.com/articles/d41586-02...
| Validark wrote:
| It's cut off on archive.is too. Can Springer Nature not
| afford us all to be able to read the full article or what? Do
| they really need $70 for a single page of information again?
| teleforce wrote:
| I hope we can decipher Indus script using AI or not [1].
|
| It's well overdue although from statistical profiling it's
| believed to be a valid linguistic script being used for writing
| system of the ancient Harappan language, the likely precursor of
| modern Dravidian languages.
|
| [1] Indus script:
|
| https://en.wikipedia.org/wiki/Indus_script
| dr_dshiv wrote:
| Claude is all too willing to provide interpretations. Why not
| give it a go and see if you can't crack it yourself? Hypothesis
| generation is needed!
| Oarch wrote:
| What if the deciphered content is the ancient equivalent of Vogon
| poetry? Do we stop?
| Octoth0rpe wrote:
| No, but the translation process would transfer from academia to
| the military industrial complex.
| sans_souse wrote:
| This concerns me. How do we assess the AI's interpretation when
| it comes to what we ourselves can't see? Have we not learned that
| AI desparately wants to supply answers to the point it
| prioritizes answers over accuracy? We already lose enough in
| translation, and do well to twist those words we can discern -
| I'd really prefer we not start filling the gaps with lies formed
| of regurgitated data pools where it's most likely sourcing
| whatever fabricated fluff it does end up using to fill in said
| gaps.
| watt wrote:
| why wouldn't you prefer _something_ over _nothing_. I assume AI
| steps in for issues that people haven't been able to begin to
| solve in decades.
| xenospn wrote:
| That _something_ could be worse than nothing.
| Majestic121 wrote:
| It's much better to have _nothing_ than the wrong
| _something_, since with a wrong _something_ you build
| assumptions on wrong premises. Much better to accept that we
| don't know (hopefully temporarily), so that people can keep
| looking into it instead of falsely believing the problem is
| solved
| davidclark wrote:
| Absolutely prefer nothing here.
| throw4847285 wrote:
| I bet Heinrich Schliemann would have loved AI.
| palmfacehn wrote:
| What is 'accuracy' when examined at depth?
|
| With the benefit of greater knowledge and context we are able
| to critique some of the answers provided by today's LLMs. With
| the benefit of hindsight we are able to see where past
| academics and thought leaders went wrong. This isn't the same
| as confirming that our own position is a zenith of
| understanding. It would be more reasonable to assume it is a
| false summit.
|
| Could we not also say that academics have a priority to
| "publish or perish"? When we use the benefits of hindsight to
| examine debunked theories, could we not also say that they were
| too eager to supply answers?
|
| I agree about models filling the gaps with whatever is most
| probable. That's what they are designed to do. My quibble is
| that humans often synthesize the least objectionable answers
| based on group-think, institutional norms and pure laziness.
| d357r0y3r wrote:
| The AI interpretation can be folded into a multidisciplinary
| approach. We wouldn't merely take AI's word for it. Does this
| interpretation make sense given what historians and
| anthropologists have learned, etc.
| indymike wrote:
| > This concerns me. How do we assess the AI's interpretation
| when it comes to what we ourselves can't see?
|
| Sometimes a clue or nudge can trigger a cascade of discovery.
| Even if that clue is wrong, it causes people to look at
| something they maybe never would have. In any case, so long as
| we're reasonably skeptical this is really no different than a
| very human way of working... have you tried "...fill in wild
| idea..."
|
| > I'd really prefer we not start filling the gaps with lies
| formed of regurgitated data pools
|
| A lie requires an intent to deceive and that is beyond the
| capability of modern AI. In many cases lie can reveal adjacent
| truth - and I suspect that is what is happening. Regardless,
| finding truth in history is really hard because many times, the
| record is filled with actual lies intended to make the victor,
| ruler or look better.
| dismalaf wrote:
| > Have we not learned that AI desparately wants to supply
| answers to the point it prioritizes answers over accuracy?
|
| Have you ever met an archaeologist?
| throw4847285 wrote:
| Yeah, I know a number of archaeologists. Among academics,
| they are some of the most conservative when it comes to
| drawing sweeping conclusions from their research. A thesis
| defense is a brutal exercise in being accused of crimes
| against parsimony by your mentors and peers.
| Electricniko wrote:
| I like to think that today's clickbait data pools are perfect
| for translating ancient texts. The software will see modern
| headlines like "Politician roasts the opposition for spending
| cuts" and come up with translations like "Emperor A roasted his
| enemies" and it will still be correct.
| InsOp wrote:
| are there any news in the voynich code?
___________________________________________________________________
(page generated 2025-01-02 23:02 UTC)