[HN Gopher] An amateur historian has discovered a long-lost shor...
___________________________________________________________________
An amateur historian has discovered a long-lost short story by Bram
Stoker
Author : lermontov
Score : 198 points
Date : 2024-10-21 16:14 UTC (6 hours ago)
(HTM) web link (www.bbc.com)
(TXT) w3m dump (www.bbc.com)
| javajosh wrote:
| Does the name "Bram Stoker" not carry any weight?
| slothtrop wrote:
| Insofar as he's associated with "that Dracula story and movie",
| yes.
| adrianmonk wrote:
| Yes, but "Dracula author" carries more, and headlines aim to
| reach as many people as possible.
| WCSTombs wrote:
| For some reason his name is in the page's <head> but not in the
| article's title.
| dang wrote:
| It does here!
| chachacharge wrote:
| Pro search tip- its Stoker not Stroker
| slothtrop wrote:
| porn parody potential there
| mock-possum wrote:
| All this, and yet no link to read it?
| alanbernstein wrote:
| It's 134 years old but hasn't been published as a book yet, so
| surely it requires 100 years of copyright protection starting
| today!
| gwbas1c wrote:
| https://news.ycombinator.com/item?id=41905844
| unit149 wrote:
| Used to be that writers were paid by the word and novels were
| serialized.
| politelemon wrote:
| You can read it here:
| https://catalogue.nli.ie/Record/vtls000924296
|
| Go full screen and go to page 2 it starts at about the middle.
| boilerupnc wrote:
| Evolving Wikipedia Entry on the Story "Gibbet Hill" [0]. Plot
| Summary described on the page.
|
| [0] https://en.wikipedia.org/wiki/Gibbet_Hill_(short_story)
| Mistletoe wrote:
| I'm concerned things like this will just be gone forever in the
| digital era. Paper and film are great storage mediums. I know
| this was on a screen but would it have still existed if it wasn't
| on paper first?
| stavros wrote:
| Hard disks are great storage mediums when we don't purposely
| set fire to them to preserve the profits of large corporations.
| The Internet Archive is perfectly capable of preserving things,
| unless copyright holders manage to shut them down for short-
| term profit.
| echelon wrote:
| IA shouldn't try to wage war against copyright. They should
| leave that to other entities.
|
| IA should be an archivist organization first and foremost and
| abandon the idea of making books, movies, and music publicly
| available. That's just painting a target on their back and
| risking their goal of preserving a snapshot of our time.
|
| The wayback machine is great, though, and they should keep
| doing that.
| bongodongobob wrote:
| What are you referring to here? Hopefully not the secure
| destruction of hard disks.
| stavros wrote:
| The law's preference for 120 years of copyright instead of
| the preservation of culture. IA should be state-funded.
| bongodongobob wrote:
| How does copyright relate to burning hard drives?
| freedomben wrote:
| Agreed, and I think it's important to note that paper doesn't
| have any sort of DRM encumbrance on it. I seriously think that
| at some point in the next few decades, the "pirates" who right
| now are hated and prosecuted vigorously by all the
| "rightsholders" may turn out to be venerable heroes for having
| preserved the creations.
|
| Imagine if we had found Bram Stokers work, and it was also
| encrypted mumbo jumbo that is now useless to us. We'll likely
| never know what we lost.
| nuz wrote:
| Seems like a non pessimistic idea of something LLMs could help us
| out with. Mass analysis of old texts for new finds like this. If
| this one exists surely there are many more just a mass analysis
| away
| steve_adams_86 wrote:
| I accidentally got Zed to parse way more code than I intended
| last night and it cost close to $2 on the anthropic API. All I
| can think is how incredibly expensive it would be to feed an
| LLM text in hopes of making those connections. I don't think
| you're wrong, though. This is the territory where their ability
| to find patterns can feel pretty magical. It would cost many,
| many, many $2 though
| pcthrowaway wrote:
| This is a pretty good case for just using a local model. Even
| if it's 50% worse than Anthropic or whatever the gap is now
| between open models and proprietary state of the art, it's
| still likely 'good enough' to categorize a story in an old
| newspaper as missing from an author's known bibliography.
| steve_adams_86 wrote:
| Good point. I use llama3.1 for a lot of small tasks and
| rarely feel like I need to use Claude instead. It's fine.
| I'm even running the model a (big) step down from 70b,
| because I've only got 32GB of ram. It's a solid model that
| probably costs me next to nothing to run.
| diggan wrote:
| > I accidentally got Zed to parse way more code than I
| intended last night and it cost close to $2 on the anthropic
| API
|
| Is that one API call or some out of control process slinging
| 100s of requests?
|
| Must have been a ton of data, as their most expensive model
| (Opus) seems to $15 per million input tokens. I guess if you
| just set it to use an entire project as the input, you'll hit
| 1m input tokens quickly.
| steve_adams_86 wrote:
| Come to think of it, I'm not sure how Zed performs LLM
| requests with the inline assistant.
|
| I wasn't working in an enormous file, but I meant to
| highlight a block and accidentally highlighted the entire
| file and asked it to do something that made no sense in
| that context. It did its best to do something with the
| situation and eventually ran out of steam, haha. It's
| possible that multiple requests needed to be made, or I was
| around the 200k context window.
|
| Previous to this I'm fairly sure most of my requests cost
| fractions of pennies. My credit takes ages to decrease by
| any meaningful amount. Except until last night. It's
| normally an extremely cost-effective tool for me.
| hyperbrainer wrote:
| The problem with copyright is going to be a big hurdle though.
| diggan wrote:
| Why? Old texts would be out of copyright, and even if they
| weren't, as long as you're not publishing the source material
| or anything containing the source material (or anything that
| can verbatim output the source), it seems you'd be in the
| clear.
| hyperbrainer wrote:
| You are right! I forgot about this completely.
| ebiester wrote:
| If we go to the era of public domain, there is no worry about
| copyright.
| busyant wrote:
| It's funny (ironic?), but when I read "an amateur {insert
| occupation} has"
|
| I mentally replace "an amateur" with "a talented and passionate"
|
| For me, amateur just doesn't mean the insult that it meant when I
| was a youngster.
| rahimnathwani wrote:
| The word 'amateur' originates from the Latin word for 'lover'.
| zanellato19 wrote:
| Thank you! I've been using this word in portuguese (amador)
| and its so _so_ clear in that language, even so, I hadn't
| realized. Amar -> Amador (the one who loves it). Quite
| clearly.
| bombcar wrote:
| Exactly, and "professional" means they do it for money.
| otherme123 wrote:
| The point is that "amateur" means literally "lover" in
| latin. While "professional" means "for money" today, in
| latin it meant "to profess a vow to do it with high
| standards".
|
| For example, you can be a professional, but do things "pro
| bono" (for free or for public good) or "pro lucro" (for
| money).
| retrac wrote:
| "Vocation" has undergone a similar shift; originally it
| meant a calling, or a summons.
| RandomThoughts3 wrote:
| It still does.
| thrwaway1337 wrote:
| Just don't go looking for the etymology of "vanilla"
| Archelaos wrote:
| "Doing something was a high standard" is still the main
| meaning of the word "professionell" in German. So someone
| can make something "unprofessionell" for money or
| "professionell" without payment.
|
| Another word of classical origin with a striking
| difference is the meaning of the word "pathetisch" in
| German, which means "(exaggeratedly) passionate", which
| corresponds more or less to the meaning of the Ancient
| Greek word "pathetikos".
| echelon wrote:
| But amateur has taken on a negative connotation in the common
| vernacular.
|
| "Amateurish" or "amateurishly" feel damning and assertions
| about a certain absence of quality or attention to detail.
|
| Describing someone as a "total amateur" feels a bit like
| calling them a hack.
|
| This needs a separate word or concept.
| adamc wrote:
| We could try reclaiming the word.
| RandomThoughts3 wrote:
| Dilletante already exists to mean someone who doesn't do
| something with seriousness and amateur doesn't carry the
| same connotation as amateurish anyway so you don't really
| need a new word.
| idiotlogical wrote:
| The term 'nerd' needs to complete its rehabilitation like
| 'geek' has the last 20 years. It's the most concise term I
| can think of when describing someone who is enthusiastic,
| focused, and knowledgable on a subject. I think it's a
| badge of honor
| PsylentKnight wrote:
| There's "aficionado", though that feels a little
| pretentious
| cortesoft wrote:
| I have never thought of it as an insult, just meaning they
| don't do it for money.
| kazinator wrote:
| Yeah but it's often intended as an insult. Especially as the
| adjective _amateurish_ , or phrases like _the work of an
| amateur_.
|
| _Amateur historian_ could never be an insult, because it 's
| actually better to have a real career in something substantial,
| and do the history stuff on the side as a hobby.
| qingcharles wrote:
| For me, amateur generally just translates as "not paid for his
| services."
| intalentive wrote:
| "Honey, come look! I've found some information all the world's
| top historians missed."
| bell-cot wrote:
| "missed" might be taken to imply that one or more of them had
| ever bothered to look.
| SketchySeaBeast wrote:
| Well, even if people were looking, this sort of thing is a
| lot of right place and right time.
| bell-cot wrote:
| Try skimming the Wikipedia articles on some major authors
| of that era, to get a sense for how much short (or
| serialized) fiction & poetry was routinely published in
| newspapers and magazines back then.
|
| Without some specific clues, a real historian would not be
| looking for Bram Stoker stories in an 1890 issue of the
| Daily Express Dublin Edition. He'd be skimming through the
| archives of many of the newspapers & magazines published in
| an era and geographic region, cataloging authors & stories
| & poems. "Success" would be just compiling a well-done
| catalog. 15 minutes of fame in the popular press could
| equally well result from finding some unknown early work by
| James Joyce, or Winston Churchill, or George Bernard Shaw,
| or Oscar Wilde, or Yeats, or ...
| jonhohle wrote:
| I've found that it's not uncommon for an interested individual
| to find details that have not been documented or "found" by
| others. I collect video games and have found variants of
| popular games that have been otherwise undocumented on any list
| or archive that I was aware of. I've found audio recordings
| from the 90s that seemingly have no recorded history on the
| internet.
|
| These aren't things historians have had hundreds of years to
| document, but several thousand or more people have been on this
| space long before I was looking at it more intently than I
| could ever and I still come across things from time to time
| that weren't known to exist.
|
| Likewise, in the past month I've spent an unfortunate amount of
| time reading laws and board bylaws and it doesn't take long to
| find long forgotten rules that are being actively violated.
| Even outside of code, documentation is hard.
| cxr wrote:
| Tyler Cowen recently interviewed a historian (Alan Taylor),
| and they approached this subject near the end of the episode
| --how much the job of a historian still involves browsing
| undigitized material sitting on a shelf in a cold room
| somewhere. Around 3215 seconds* in:
|
| > _And then there 's also a kind of notion that everything is
| there online when in point of fact lots of information about
| the past still only exists in archives_
|
| <https://conversationswithtyler.com/episodes/alan-taylor/>
|
| * of the audio version, that is; at that timestamp in the
| YouTube video, they're discussing the question "How will
| large language models change historical research"--
| interviewee's response: he doesn't know
| bredren wrote:
| This happens often when going down the rabbit hole on a niche
| project. For example, repair and restoration of Persian rugs.
|
| There are many details to the craft that are hinted at in
| variety of formats, (youtube videos, blog entries, etc) but
| the clear truths are not clearly stated anywhere. These are
| stored in the minds and practices of artisans.
| nu11ptr wrote:
| How would copyright law apply here? Would this fall into the
| public domain immediately? I read that Irish law is that it would
| be "70 years from date first made available to the public". Since
| published in a newspaper, I would assume this would be public
| domain now. Correct?
| cortesoft wrote:
| Yes, it's public domain
| zozbot234 wrote:
| If this was an unpublished manuscript, rights of first
| publication would apply and it might be covered by a kind of
| copyright that would vary depending on the country. Since this
| was "rediscovered" after first being unambiguously published
| back in the 1890s, it's pretty clearly in the public domain.
|
| OP got incredibly lucky though that the author's name was
| included in the original publication - things like this (i.e.
| contributions to newspapers or magazines) were often published
| under obscure pseudonyms, initials, puzzling hints like "By the
| author of Such-and-such" or no author indication at all.
| papercrane wrote:
| I _think_ UK copyright law would matter here, since at the time
| the story was published (1890) the Ireland was part of the UK
| (Ireland gained independence in 1921.)
|
| If UK copyright applied, then the story would have entered
| public domain in 1932. The term of copyright for published
| works at the time as 7 years after the authors death, or 42
| years, whichever was longer.
| Rebelgecko wrote:
| Funnily enough there was a reddit post from around the time the
| manuscript was discovered (but before it was announced) asking
| a similar question
| mmastrac wrote:
| I started a quick transcription here -- not enough time to
| complete more than half the first column, but some scans and very
| rough OCR are here if anyone is interested in contributing:
|
| https://github.com/mmastrac/gibbet-hill
|
| Top and bottom halves of the page in the repo here:
|
| https://github.com/mmastrac/gibbet-hill/blob/main/scan-1.png
| https://github.com/mmastrac/gibbet-hill/blob/main/scan-2.png
|
| EDIT: If you have access to a multi-modal LLM, the rough
| transcription + the column scan and the instruction to "OCR this
| text, keep linebreaks" gives a _very good_ result.
|
| EDIT 2: Rough draft, needs some proofreading and corrections:
|
| https://github.com/mmastrac/gibbet-hill/blob/main/story.md
| simonw wrote:
| I tried extracting the content using Google Gemini 1.5 Pro 002
| using https://aistudio.google.com/ - the first page (scan-2)
| worked fantastically well, the second page not so much. Here's
| what I got so far:
| https://gist.github.com/simonw/ba87f507ef5c11d3335959c055533...
| mmastrac wrote:
| I cropped the columns out into six files -- it might have an
| easier time with these:
|
| https://github.com/mmastrac/gibbet-
| hill/blob/main/col-1-a.pn...
| reaperducer wrote:
| ...and my wife's Halloween present has been printed.
|
| Tip: Load the pngs into Preview, hit "Auto Levels," and
| crank up "Sharpness" on each one. Looks pretty good!
| quuxplusone wrote:
| Seems like you don't need an LLM, you just need a human who (1)
| likes reading Stoker and (2) touch-types. :) I'd volunteer, if
| I didn't think I'd be duplicating effort at this point.
|
| (I've transcribed various things over the years, including
| Sonia Greene's _Alcestis_ [1] and Holtzman & Kershenblatt's
| "Castlequest" source code [2], so I know it doesn't take much
| except quick fingers and sufficient motivation. :))
|
| [1] https://quuxplusone.github.io/blog/2022/10/22/alcestis/
|
| [2] https://quuxplusone.github.io/blog/2021/03/09/castlequest/
|
| EDIT: ...and as I was writing that, you seem to have finished
| your transcription. :)
| mmastrac wrote:
| I finished a very rough, tesseract + LLM transcription, but
| it absolutely needs editing passes.
|
| I've done transcription in the past myself (did two books for
| standard ebooks with some from-scratch transcription and lots
| of editing) and I know the pain. I've always found it easier
| to fix up OCR than type the whole thing by hand because I've
| found my error rate of eyeball transcription to be higher.
|
| If you want to tackle the proofing passes, I'm happy to add
| you to the repo :)
| wahnfrieden wrote:
| Use LiveText API. Much much better accuracy than Tesseract.
| You can rent access to it.
| 1317 wrote:
| probably you would want to get the project gutenberg people
| onto it
| cxr wrote:
| Too late. You have already been scooped by, of course, tumblr:
|
| <https://woodsfae.tumblr.com/post/764918993659330560/gibbet-h..
| .>
| oliyoung wrote:
| A battle of a Tumblr user named Woodsfae versus advanced LLM
| transcribing new goth literature?
|
| That's like bringing a knife to a gun fight my friend, never
| underestimate the power of a committed Tumblr user
| fauria wrote:
| Brian Cleary will be discussing his findings next Saturday in
| Dublin, as part of the Bram Stoker Festival:
| https://bramstokerfestival.com/en/events/an-extraordinary-br...
| ndileas wrote:
| I don't mean to disparage this particular instance at all, as it
| seems pretty great. But I wonder if the rise of llms is going to
| make scams that sounds a lot like this much easier in the future.
| I think at the moment it's hard to make something really sound
| like a particular author without a lot of work, but that will
| probably change in the future.
| bredren wrote:
| Sure, people can do scams but it will be way more interesting
| to apply them to finding stuff like this. Up through now,
| literary treasures and open secrets are sitting out waiting to
| be recognized.
|
| And why bother with trying to deceive when one could build
| reputation for creating truly good fan fiction based off real
| source material.
|
| Just because tech can be used to abuse trust doesn't mean it
| will be the most interesting and commonly recognized thing to
| do with it.
| booleandilemma wrote:
| I can see it now:
|
| "3 million lost works of Shakespeare found"
| staticman2 wrote:
| I remember reading somewhere- I think it was in an annotated
| addition of Dracula, or maybe it was a journal article- that said
| that Bram Stoker wrote a large number of novels but everything he
| wrote other than Dracula was awful. Per Wikipedia he wrote 14
| books, supposedly he was only able to write one good one.
| reaperducer wrote:
| I suspect you're getting downvoted by people who haven't
| actually read anything by Stoker.
|
| My wife has read most of his stuff. I know because I buy it for
| her. She says aside from Dracula, most of it is not great.
| timeinput wrote:
| For me it feels like Stokers dracula is only so popular
| because it's where all the tropes come from, not because it's
| particularly well written, or something like that.
|
| It's one of those firsts that established a genre.
|
| I know Stoker didn't invent vampires, but they came into
| western English speaking culture through his Dracula.
| nu11ptr wrote:
| I am not a literary critic, but I very much enjoyed
| Dracula. When I read it, I did not know there were claims
| he wasn't a good writer, so I had no bias, I simply liked
| it quite a bit.
| nu11ptr wrote:
| Not a novel, but the short story "Dracula's Guest" I thought
| was quite good. I was sad it was so short.
| hshshshshsh wrote:
| I don't know why people get obsessed over things like this.
| Finding significance in something because it's written by an
| entity whose name is popular makes no sense.
___________________________________________________________________
(page generated 2024-10-21 23:00 UTC)