[HN Gopher] Show HN: @smoores/epub, a JavaScript library for wor...
       ___________________________________________________________________
        
       Show HN: @smoores/epub, a JavaScript library for working with EPUB
       publications
        
       Howdy! I've just written a blog post about this, and I figured I
       would share it here:
       https://smoores.dev/post/announcing_smoores_epub/. As I've been
       working on Storyteller[1], I've been developing a library for
       working with EPUB files, since that's a large amount of the work
       that Storyteller does. After a friend asked for advice on creating
       EPUB books in Node.js, I decided to publish Storyteller's EPUB
       library as a standalone NPM package. I really love the EPUB spec,
       and I think the Node.js developer community deserves an actively
       maintained library for working with it!  [1]:
       https://smoores.gitlab.io/storyteller
        
       Author : smoores
       Score  : 60 points
       Date   : 2024-12-13 19:52 UTC (3 hours ago)
        
 (HTM) web link (www.npmjs.com)
 (TXT) w3m dump (www.npmjs.com)
        
       | hombre_fatal wrote:
       | Extracting a library from a real world project is one of my
       | favorite parts of software.
       | 
       | I'm sure the march of LLMs will continue eating into this pie,
       | and that's a good thing (most of it is a distraction from the
       | real work), but I love polishing a library on my laptop in a
       | cafe. It's like working on a painting or something.
        
         | smoores wrote:
         | It was, actually, very enjoyable! When we pulled React
         | ProseMirror[1] out of the NYT text editor, it was a pretty
         | laborious process that we had to careful plan and execute for
         | months, and we still ended up with an internal fork for a
         | while.
         | 
         | By contrast, this was mostly just moving a file around and then
         | writing documentation and cleaning up the public API. I rather
         | enjoy thinking about and modeling library APIs in general, so I
         | actually had a lot of fun with it!
         | 
         | [1]: https://www.npmjs.com/package/@nytimes/react-prosemirror
        
       | gavmor wrote:
       | Cool! I totally could have used this earlier this year... can't
       | remember what for...
       | 
       | Interesting choice to publish from the storyteller "monorepo." Is
       | that because it evolved in situ, and you've no impetus to incur
       | the overhead of extraction?
        
         | smoores wrote:
         | Hahaha, well if it comes to you, the library will still be
         | there for you :)
         | 
         | Right, this was actually just a file within the Storyteller web
         | package to start. It was fairly well defined, and so pretty
         | easy to pull out into another package in the monorepo, but
         | Storyteller is the primary consumer at the moment, and I want
         | to be able to develop them in sync. Plus, it provides a great
         | test bed for development of the library!
         | 
         | edit: I forgot to mention that the eventual goal is to
         | (hopefully!) publish this package as @storyteller/epub, along
         | with any other packages that end up split out of Storyteller.
         | That will probably include at least a @storyteller/synchronize
         | and a @storyteller/cli.
         | 
         | Unfortunately, someone seems to have snagged the @storyteller
         | org on NPM several years ago and left it to languish without
         | really using it, so I'm waiting to see whether GitHub will
         | consider this squatting and transfer the org to me.
         | 
         | I've also tried reaching out to the developer that owns the
         | org, but they don't seem to have been active on GitHub or NPM
         | for the past 5 years or so, and my only real strategy for
         | reaching out to them was to open an issue on one of their other
         | GitHub projects!
        
       | curtis3389 wrote:
       | Something I've been wondering: why do ebooks take so long to
       | render? My kindle seems good at it, but opening an ebook in
       | calibre/fbreader/etc can take minutes or even fail in some
       | readers depending on the ebook.
        
         | smoores wrote:
         | I would guess there are multiple potential pitfalls here.
         | Firstly, not all ebook formats are created equal -- Storyteller
         | only operates on EPUB files, because EPUB is an open source
         | format and it supports Media Overlays (read-aloud) natively. I
         | can only really speak to that format, but there are others
         | (MOBI, PDF, etc).
         | 
         | An EPUB is just a ZIP archive of XML and XHTML files (plus
         | other assets, like images). Partly, I suspect, because of the
         | dearth of actively maintained open source projects in the
         | space, and partly because of the nature of tech in the book
         | publishing industry, EPUB generation software used by authors
         | and publishers often messes up this spec, which means that EPUB
         | readers sometimes need to have fairly complex fallback logic
         | for trying to figure out how to render a book. Also, because
         | EPUBs are ZIP archives, some readers may either unzip the
         | entire book into memory or "explode" it into an unzipped
         | directory on disk, both of which may result in some slowness,
         | especially if the book has lots of large resources. The newest
         | Brandon Sanderson novel, for example, is ~300MB _zipped_.
         | 
         | Additionally, and perhaps more importantly, EPUBs (and I
         | believe MOBIs as well) represent content as XHTML and CSS,
         | which means that readers very often need to use a browser or
         | webview to actually render the book. Precisely how they deliver
         | this content into the webview can have a huge impact on
         | performance; most browser don't love to be told to format
         | entire novels worth of content into text columns, for example.
        
           | curtis3389 wrote:
           | Thank you so much! That's incredibly enlightening!
        
             | smoores wrote:
             | Of course! I'm hoping to have a web reader with Media
             | Overlay support built in to Storyteller available in the
             | next few months, along with some much needed library
             | management tooling, so maybe that will be useful for you!
             | I'll try to make it snappy :)
        
           | giantrobot wrote:
           | Additionally the XHTML content _can_ just be a single large
           | file instead of one file per chapter /section. Paginating and
           | rendering the large single file is going to be more effort
           | than the same on a smaller file. This is all on top of the
           | pitfalls and variability you mention.
        
             | smoores wrote:
             | Yup, great point. Especially if you've used some tool to
             | convert from another file, like a PDF, into an EPUB, you
             | can easily end up with the entire book in a single XHTML
             | file, which, again, can be pretty heavy for a browser to
             | parse and format! I also have no idea whether Calibre et al
             | actually use native web views, or have their own renderers,
             | which are almost certainly less performant than native web
             | views!
        
             | cyberax wrote:
             | > Additionally the XHTML content can just be a single large
             | file instead of one file per chapter/section.
             | 
             | Terry Pratchett books are notorious for that. Some tools
             | EPUB authoring tools artificially introduce breaks, but you
             | can't rely on them.
        
           | Yoric wrote:
           | FWIW, I wrote an EPUB (well, it was called OEBPS at the time)
           | reader that rendered pretty much all of the format ~21 years
           | ago (including all of XHTML and CSS) and it had very decent
           | performance. I seem to recall that someone tried it on the
           | One Laptop Per Child XO and it was... well, slow, but it
           | worked.
           | 
           | So it's possible :)
        
           | cyberax wrote:
           | I used Storyteller to align the most recent Sanderson's novel
           | on audio and the result is 1.7Gb. That's... painful. It
           | resulted in it crashing the reader on Remarkable2 tablet.
           | 
           | I'm now actually working on a Calibre-Web change to strip the
           | audio and media overlay from the books it serves via OPDS.
           | 
           | Then I'll need to tackle cross-device progress sync. This
           | turned out to be surprisingly tricky.
        
             | smoores wrote:
             | You can't do much better than that; that's the size of the
             | audiobook! For what it's worth, I also used Storyteller on
             | Wind and Truth, and got it down to 1.2GB by using the OPUS
             | codec with a 32 kb/s bitrate.
        
               | cyberax wrote:
               | Yeah. My current workaround is to create KEPUBs (Kobo-
               | optimized epubs), but that creates an issue with cross-
               | format reading progress sync. This is an interesting task
               | in itself, though.
               | 
               | So I'm trying to design a progress sync protocol. My
               | current idea is to just use several words from the text
               | itself to unambiguously pinpoint the position within a
               | section (chapter).
        
               | smoores wrote:
               | Is the idea that you have some devices that you want to
               | download just the text to, but have it sync with your
               | other devices? I think we could support that natively,
               | honestly! Storyteller already has the input files, and it
               | uses a text-based position system that doesn't require
               | the audio to exist. If you're already doing work on this,
               | maybe we could add it to Storyteller?
        
         | sriacha wrote:
         | I find Koreader (linux version) leaps and bounds faster than
         | the calibre reader.
        
       | seanwessmith wrote:
       | sometimes life just delivers. was looking for this 2 days ago
        
         | smoores wrote:
         | Outstanding. Let me know if you run into any issues!
        
       | revskill wrote:
       | Genius. Thanks.
        
       | herrvogel- wrote:
       | Storyteller seems pretty cool in general. Can it be used to host
       | books for other people?
        
         | smoores wrote:
         | Thanks! Absolutely. You can invite users to your Storyteller
         | server and give them whatever permissions are appropriate (e.g.
         | you can choose whether they can only download books, or can
         | also manage uploading and syncing books and/or managing users).
         | It has SMTP support for emailing invites, or it can just
         | generate invite links for you to share yourself.
         | 
         | More info here:
         | https://smoores.gitlab.io/storyteller/docs/administering#inv...
        
           | 9dev wrote:
           | Oh my! This looks very neat, and I've been working on
           | something similar to Storyteller (i think):
           | https://github.com/project-kiosk/kiosk
           | 
           | I don't get around working on it right now, but maybe there's
           | something useful there for you.
        
       | cyberax wrote:
       | BTW, I've had this idea in my mind for a while, after slowly
       | working through my e-library infrastructure.
       | 
       | Do you think it might be a good idea to set up a site to share
       | the aligned overlays from Storyteller? This way, people won't
       | have to waste CPU/GPU time re-aligning the same files over and
       | over again.
       | 
       | It should be OK from a copyright perspective, as it won't be
       | distributing any copyrighted material, only the media overlay
       | information.
        
         | smoores wrote:
         | That's a really interesting idea! The more I think about it,
         | the more I like it.
         | 
         | A challenge I foresee is that the media overlays are only
         | reusable if you have the exact same input EPUB file, and have
         | processed it with Storyteller to mark up the sentence
         | boundaries. EPUBs have unique identifiers, though, so maybe
         | this would be fine! We'd need to add a new processing flow to
         | Storyteller, but it should be doable.
         | 
         | Feel free to hit me up in the Storyteller chat if you want to
         | discuss more! Thanks for sharing this idea!
        
       ___________________________________________________________________
       (page generated 2024-12-13 23:00 UTC)