[HN Gopher] An amateur historian has discovered a long-lost shor...
       ___________________________________________________________________
        
       An amateur historian has discovered a long-lost short story by Bram
       Stoker
        
       Author : lermontov
       Score  : 198 points
       Date   : 2024-10-21 16:14 UTC (6 hours ago)
        
 (HTM) web link (www.bbc.com)
 (TXT) w3m dump (www.bbc.com)
        
       | javajosh wrote:
       | Does the name "Bram Stoker" not carry any weight?
        
         | slothtrop wrote:
         | Insofar as he's associated with "that Dracula story and movie",
         | yes.
        
         | adrianmonk wrote:
         | Yes, but "Dracula author" carries more, and headlines aim to
         | reach as many people as possible.
        
         | WCSTombs wrote:
         | For some reason his name is in the page's <head> but not in the
         | article's title.
        
         | dang wrote:
         | It does here!
        
         | chachacharge wrote:
         | Pro search tip- its Stoker not Stroker
        
           | slothtrop wrote:
           | porn parody potential there
        
       | mock-possum wrote:
       | All this, and yet no link to read it?
        
         | alanbernstein wrote:
         | It's 134 years old but hasn't been published as a book yet, so
         | surely it requires 100 years of copyright protection starting
         | today!
        
         | gwbas1c wrote:
         | https://news.ycombinator.com/item?id=41905844
        
         | unit149 wrote:
         | Used to be that writers were paid by the word and novels were
         | serialized.
        
       | politelemon wrote:
       | You can read it here:
       | https://catalogue.nli.ie/Record/vtls000924296
       | 
       | Go full screen and go to page 2 it starts at about the middle.
        
       | boilerupnc wrote:
       | Evolving Wikipedia Entry on the Story "Gibbet Hill" [0]. Plot
       | Summary described on the page.
       | 
       | [0] https://en.wikipedia.org/wiki/Gibbet_Hill_(short_story)
        
       | Mistletoe wrote:
       | I'm concerned things like this will just be gone forever in the
       | digital era. Paper and film are great storage mediums. I know
       | this was on a screen but would it have still existed if it wasn't
       | on paper first?
        
         | stavros wrote:
         | Hard disks are great storage mediums when we don't purposely
         | set fire to them to preserve the profits of large corporations.
         | The Internet Archive is perfectly capable of preserving things,
         | unless copyright holders manage to shut them down for short-
         | term profit.
        
           | echelon wrote:
           | IA shouldn't try to wage war against copyright. They should
           | leave that to other entities.
           | 
           | IA should be an archivist organization first and foremost and
           | abandon the idea of making books, movies, and music publicly
           | available. That's just painting a target on their back and
           | risking their goal of preserving a snapshot of our time.
           | 
           | The wayback machine is great, though, and they should keep
           | doing that.
        
           | bongodongobob wrote:
           | What are you referring to here? Hopefully not the secure
           | destruction of hard disks.
        
             | stavros wrote:
             | The law's preference for 120 years of copyright instead of
             | the preservation of culture. IA should be state-funded.
        
               | bongodongobob wrote:
               | How does copyright relate to burning hard drives?
        
         | freedomben wrote:
         | Agreed, and I think it's important to note that paper doesn't
         | have any sort of DRM encumbrance on it. I seriously think that
         | at some point in the next few decades, the "pirates" who right
         | now are hated and prosecuted vigorously by all the
         | "rightsholders" may turn out to be venerable heroes for having
         | preserved the creations.
         | 
         | Imagine if we had found Bram Stokers work, and it was also
         | encrypted mumbo jumbo that is now useless to us. We'll likely
         | never know what we lost.
        
       | nuz wrote:
       | Seems like a non pessimistic idea of something LLMs could help us
       | out with. Mass analysis of old texts for new finds like this. If
       | this one exists surely there are many more just a mass analysis
       | away
        
         | steve_adams_86 wrote:
         | I accidentally got Zed to parse way more code than I intended
         | last night and it cost close to $2 on the anthropic API. All I
         | can think is how incredibly expensive it would be to feed an
         | LLM text in hopes of making those connections. I don't think
         | you're wrong, though. This is the territory where their ability
         | to find patterns can feel pretty magical. It would cost many,
         | many, many $2 though
        
           | pcthrowaway wrote:
           | This is a pretty good case for just using a local model. Even
           | if it's 50% worse than Anthropic or whatever the gap is now
           | between open models and proprietary state of the art, it's
           | still likely 'good enough' to categorize a story in an old
           | newspaper as missing from an author's known bibliography.
        
             | steve_adams_86 wrote:
             | Good point. I use llama3.1 for a lot of small tasks and
             | rarely feel like I need to use Claude instead. It's fine.
             | I'm even running the model a (big) step down from 70b,
             | because I've only got 32GB of ram. It's a solid model that
             | probably costs me next to nothing to run.
        
           | diggan wrote:
           | > I accidentally got Zed to parse way more code than I
           | intended last night and it cost close to $2 on the anthropic
           | API
           | 
           | Is that one API call or some out of control process slinging
           | 100s of requests?
           | 
           | Must have been a ton of data, as their most expensive model
           | (Opus) seems to $15 per million input tokens. I guess if you
           | just set it to use an entire project as the input, you'll hit
           | 1m input tokens quickly.
        
             | steve_adams_86 wrote:
             | Come to think of it, I'm not sure how Zed performs LLM
             | requests with the inline assistant.
             | 
             | I wasn't working in an enormous file, but I meant to
             | highlight a block and accidentally highlighted the entire
             | file and asked it to do something that made no sense in
             | that context. It did its best to do something with the
             | situation and eventually ran out of steam, haha. It's
             | possible that multiple requests needed to be made, or I was
             | around the 200k context window.
             | 
             | Previous to this I'm fairly sure most of my requests cost
             | fractions of pennies. My credit takes ages to decrease by
             | any meaningful amount. Except until last night. It's
             | normally an extremely cost-effective tool for me.
        
         | hyperbrainer wrote:
         | The problem with copyright is going to be a big hurdle though.
        
           | diggan wrote:
           | Why? Old texts would be out of copyright, and even if they
           | weren't, as long as you're not publishing the source material
           | or anything containing the source material (or anything that
           | can verbatim output the source), it seems you'd be in the
           | clear.
        
             | hyperbrainer wrote:
             | You are right! I forgot about this completely.
        
           | ebiester wrote:
           | If we go to the era of public domain, there is no worry about
           | copyright.
        
       | busyant wrote:
       | It's funny (ironic?), but when I read "an amateur {insert
       | occupation} has"
       | 
       | I mentally replace "an amateur" with "a talented and passionate"
       | 
       | For me, amateur just doesn't mean the insult that it meant when I
       | was a youngster.
        
         | rahimnathwani wrote:
         | The word 'amateur' originates from the Latin word for 'lover'.
        
           | zanellato19 wrote:
           | Thank you! I've been using this word in portuguese (amador)
           | and its so _so_ clear in that language, even so, I hadn't
           | realized. Amar -> Amador (the one who loves it). Quite
           | clearly.
        
           | bombcar wrote:
           | Exactly, and "professional" means they do it for money.
        
             | otherme123 wrote:
             | The point is that "amateur" means literally "lover" in
             | latin. While "professional" means "for money" today, in
             | latin it meant "to profess a vow to do it with high
             | standards".
             | 
             | For example, you can be a professional, but do things "pro
             | bono" (for free or for public good) or "pro lucro" (for
             | money).
        
               | retrac wrote:
               | "Vocation" has undergone a similar shift; originally it
               | meant a calling, or a summons.
        
               | RandomThoughts3 wrote:
               | It still does.
        
               | thrwaway1337 wrote:
               | Just don't go looking for the etymology of "vanilla"
        
               | Archelaos wrote:
               | "Doing something was a high standard" is still the main
               | meaning of the word "professionell" in German. So someone
               | can make something "unprofessionell" for money or
               | "professionell" without payment.
               | 
               | Another word of classical origin with a striking
               | difference is the meaning of the word "pathetisch" in
               | German, which means "(exaggeratedly) passionate", which
               | corresponds more or less to the meaning of the Ancient
               | Greek word "pathetikos".
        
           | echelon wrote:
           | But amateur has taken on a negative connotation in the common
           | vernacular.
           | 
           | "Amateurish" or "amateurishly" feel damning and assertions
           | about a certain absence of quality or attention to detail.
           | 
           | Describing someone as a "total amateur" feels a bit like
           | calling them a hack.
           | 
           | This needs a separate word or concept.
        
             | adamc wrote:
             | We could try reclaiming the word.
        
             | RandomThoughts3 wrote:
             | Dilletante already exists to mean someone who doesn't do
             | something with seriousness and amateur doesn't carry the
             | same connotation as amateurish anyway so you don't really
             | need a new word.
        
             | idiotlogical wrote:
             | The term 'nerd' needs to complete its rehabilitation like
             | 'geek' has the last 20 years. It's the most concise term I
             | can think of when describing someone who is enthusiastic,
             | focused, and knowledgable on a subject. I think it's a
             | badge of honor
        
               | PsylentKnight wrote:
               | There's "aficionado", though that feels a little
               | pretentious
        
         | cortesoft wrote:
         | I have never thought of it as an insult, just meaning they
         | don't do it for money.
        
         | kazinator wrote:
         | Yeah but it's often intended as an insult. Especially as the
         | adjective _amateurish_ , or phrases like _the work of an
         | amateur_.
         | 
         |  _Amateur historian_ could never be an insult, because it 's
         | actually better to have a real career in something substantial,
         | and do the history stuff on the side as a hobby.
        
         | qingcharles wrote:
         | For me, amateur generally just translates as "not paid for his
         | services."
        
       | intalentive wrote:
       | "Honey, come look! I've found some information all the world's
       | top historians missed."
        
         | bell-cot wrote:
         | "missed" might be taken to imply that one or more of them had
         | ever bothered to look.
        
           | SketchySeaBeast wrote:
           | Well, even if people were looking, this sort of thing is a
           | lot of right place and right time.
        
             | bell-cot wrote:
             | Try skimming the Wikipedia articles on some major authors
             | of that era, to get a sense for how much short (or
             | serialized) fiction & poetry was routinely published in
             | newspapers and magazines back then.
             | 
             | Without some specific clues, a real historian would not be
             | looking for Bram Stoker stories in an 1890 issue of the
             | Daily Express Dublin Edition. He'd be skimming through the
             | archives of many of the newspapers & magazines published in
             | an era and geographic region, cataloging authors & stories
             | & poems. "Success" would be just compiling a well-done
             | catalog. 15 minutes of fame in the popular press could
             | equally well result from finding some unknown early work by
             | James Joyce, or Winston Churchill, or George Bernard Shaw,
             | or Oscar Wilde, or Yeats, or ...
        
         | jonhohle wrote:
         | I've found that it's not uncommon for an interested individual
         | to find details that have not been documented or "found" by
         | others. I collect video games and have found variants of
         | popular games that have been otherwise undocumented on any list
         | or archive that I was aware of. I've found audio recordings
         | from the 90s that seemingly have no recorded history on the
         | internet.
         | 
         | These aren't things historians have had hundreds of years to
         | document, but several thousand or more people have been on this
         | space long before I was looking at it more intently than I
         | could ever and I still come across things from time to time
         | that weren't known to exist.
         | 
         | Likewise, in the past month I've spent an unfortunate amount of
         | time reading laws and board bylaws and it doesn't take long to
         | find long forgotten rules that are being actively violated.
         | Even outside of code, documentation is hard.
        
           | cxr wrote:
           | Tyler Cowen recently interviewed a historian (Alan Taylor),
           | and they approached this subject near the end of the episode
           | --how much the job of a historian still involves browsing
           | undigitized material sitting on a shelf in a cold room
           | somewhere. Around 3215 seconds* in:
           | 
           | > _And then there 's also a kind of notion that everything is
           | there online when in point of fact lots of information about
           | the past still only exists in archives_
           | 
           | <https://conversationswithtyler.com/episodes/alan-taylor/>
           | 
           | * of the audio version, that is; at that timestamp in the
           | YouTube video, they're discussing the question "How will
           | large language models change historical research"--
           | interviewee's response: he doesn't know
        
           | bredren wrote:
           | This happens often when going down the rabbit hole on a niche
           | project. For example, repair and restoration of Persian rugs.
           | 
           | There are many details to the craft that are hinted at in
           | variety of formats, (youtube videos, blog entries, etc) but
           | the clear truths are not clearly stated anywhere. These are
           | stored in the minds and practices of artisans.
        
       | nu11ptr wrote:
       | How would copyright law apply here? Would this fall into the
       | public domain immediately? I read that Irish law is that it would
       | be "70 years from date first made available to the public". Since
       | published in a newspaper, I would assume this would be public
       | domain now. Correct?
        
         | cortesoft wrote:
         | Yes, it's public domain
        
         | zozbot234 wrote:
         | If this was an unpublished manuscript, rights of first
         | publication would apply and it might be covered by a kind of
         | copyright that would vary depending on the country. Since this
         | was "rediscovered" after first being unambiguously published
         | back in the 1890s, it's pretty clearly in the public domain.
         | 
         | OP got incredibly lucky though that the author's name was
         | included in the original publication - things like this (i.e.
         | contributions to newspapers or magazines) were often published
         | under obscure pseudonyms, initials, puzzling hints like "By the
         | author of Such-and-such" or no author indication at all.
        
         | papercrane wrote:
         | I _think_ UK copyright law would matter here, since at the time
         | the story was published (1890) the Ireland was part of the UK
         | (Ireland gained independence in 1921.)
         | 
         | If UK copyright applied, then the story would have entered
         | public domain in 1932. The term of copyright for published
         | works at the time as 7 years after the authors death, or 42
         | years, whichever was longer.
        
         | Rebelgecko wrote:
         | Funnily enough there was a reddit post from around the time the
         | manuscript was discovered (but before it was announced) asking
         | a similar question
        
       | mmastrac wrote:
       | I started a quick transcription here -- not enough time to
       | complete more than half the first column, but some scans and very
       | rough OCR are here if anyone is interested in contributing:
       | 
       | https://github.com/mmastrac/gibbet-hill
       | 
       | Top and bottom halves of the page in the repo here:
       | 
       | https://github.com/mmastrac/gibbet-hill/blob/main/scan-1.png
       | https://github.com/mmastrac/gibbet-hill/blob/main/scan-2.png
       | 
       | EDIT: If you have access to a multi-modal LLM, the rough
       | transcription + the column scan and the instruction to "OCR this
       | text, keep linebreaks" gives a _very good_ result.
       | 
       | EDIT 2: Rough draft, needs some proofreading and corrections:
       | 
       | https://github.com/mmastrac/gibbet-hill/blob/main/story.md
        
         | simonw wrote:
         | I tried extracting the content using Google Gemini 1.5 Pro 002
         | using https://aistudio.google.com/ - the first page (scan-2)
         | worked fantastically well, the second page not so much. Here's
         | what I got so far:
         | https://gist.github.com/simonw/ba87f507ef5c11d3335959c055533...
        
           | mmastrac wrote:
           | I cropped the columns out into six files -- it might have an
           | easier time with these:
           | 
           | https://github.com/mmastrac/gibbet-
           | hill/blob/main/col-1-a.pn...
        
             | reaperducer wrote:
             | ...and my wife's Halloween present has been printed.
             | 
             | Tip: Load the pngs into Preview, hit "Auto Levels," and
             | crank up "Sharpness" on each one. Looks pretty good!
        
         | quuxplusone wrote:
         | Seems like you don't need an LLM, you just need a human who (1)
         | likes reading Stoker and (2) touch-types. :) I'd volunteer, if
         | I didn't think I'd be duplicating effort at this point.
         | 
         | (I've transcribed various things over the years, including
         | Sonia Greene's _Alcestis_ [1] and Holtzman  & Kershenblatt's
         | "Castlequest" source code [2], so I know it doesn't take much
         | except quick fingers and sufficient motivation. :))
         | 
         | [1] https://quuxplusone.github.io/blog/2022/10/22/alcestis/
         | 
         | [2] https://quuxplusone.github.io/blog/2021/03/09/castlequest/
         | 
         | EDIT: ...and as I was writing that, you seem to have finished
         | your transcription. :)
        
           | mmastrac wrote:
           | I finished a very rough, tesseract + LLM transcription, but
           | it absolutely needs editing passes.
           | 
           | I've done transcription in the past myself (did two books for
           | standard ebooks with some from-scratch transcription and lots
           | of editing) and I know the pain. I've always found it easier
           | to fix up OCR than type the whole thing by hand because I've
           | found my error rate of eyeball transcription to be higher.
           | 
           | If you want to tackle the proofing passes, I'm happy to add
           | you to the repo :)
        
             | wahnfrieden wrote:
             | Use LiveText API. Much much better accuracy than Tesseract.
             | You can rent access to it.
        
         | 1317 wrote:
         | probably you would want to get the project gutenberg people
         | onto it
        
         | cxr wrote:
         | Too late. You have already been scooped by, of course, tumblr:
         | 
         | <https://woodsfae.tumblr.com/post/764918993659330560/gibbet-h..
         | .>
        
           | oliyoung wrote:
           | A battle of a Tumblr user named Woodsfae versus advanced LLM
           | transcribing new goth literature?
           | 
           | That's like bringing a knife to a gun fight my friend, never
           | underestimate the power of a committed Tumblr user
        
       | fauria wrote:
       | Brian Cleary will be discussing his findings next Saturday in
       | Dublin, as part of the Bram Stoker Festival:
       | https://bramstokerfestival.com/en/events/an-extraordinary-br...
        
       | ndileas wrote:
       | I don't mean to disparage this particular instance at all, as it
       | seems pretty great. But I wonder if the rise of llms is going to
       | make scams that sounds a lot like this much easier in the future.
       | I think at the moment it's hard to make something really sound
       | like a particular author without a lot of work, but that will
       | probably change in the future.
        
         | bredren wrote:
         | Sure, people can do scams but it will be way more interesting
         | to apply them to finding stuff like this. Up through now,
         | literary treasures and open secrets are sitting out waiting to
         | be recognized.
         | 
         | And why bother with trying to deceive when one could build
         | reputation for creating truly good fan fiction based off real
         | source material.
         | 
         | Just because tech can be used to abuse trust doesn't mean it
         | will be the most interesting and commonly recognized thing to
         | do with it.
        
         | booleandilemma wrote:
         | I can see it now:
         | 
         | "3 million lost works of Shakespeare found"
        
       | staticman2 wrote:
       | I remember reading somewhere- I think it was in an annotated
       | addition of Dracula, or maybe it was a journal article- that said
       | that Bram Stoker wrote a large number of novels but everything he
       | wrote other than Dracula was awful. Per Wikipedia he wrote 14
       | books, supposedly he was only able to write one good one.
        
         | reaperducer wrote:
         | I suspect you're getting downvoted by people who haven't
         | actually read anything by Stoker.
         | 
         | My wife has read most of his stuff. I know because I buy it for
         | her. She says aside from Dracula, most of it is not great.
        
           | timeinput wrote:
           | For me it feels like Stokers dracula is only so popular
           | because it's where all the tropes come from, not because it's
           | particularly well written, or something like that.
           | 
           | It's one of those firsts that established a genre.
           | 
           | I know Stoker didn't invent vampires, but they came into
           | western English speaking culture through his Dracula.
        
             | nu11ptr wrote:
             | I am not a literary critic, but I very much enjoyed
             | Dracula. When I read it, I did not know there were claims
             | he wasn't a good writer, so I had no bias, I simply liked
             | it quite a bit.
        
         | nu11ptr wrote:
         | Not a novel, but the short story "Dracula's Guest" I thought
         | was quite good. I was sad it was so short.
        
       | hshshshshsh wrote:
       | I don't know why people get obsessed over things like this.
       | Finding significance in something because it's written by an
       | entity whose name is popular makes no sense.
        
       ___________________________________________________________________
       (page generated 2024-10-21 23:00 UTC)