[HN Gopher] Show HN: @smoores/epub, a JavaScript library for wor...
___________________________________________________________________
Show HN: @smoores/epub, a JavaScript library for working with EPUB
publications
Howdy! I've just written a blog post about this, and I figured I
would share it here:
https://smoores.dev/post/announcing_smoores_epub/. As I've been
working on Storyteller[1], I've been developing a library for
working with EPUB files, since that's a large amount of the work
that Storyteller does. After a friend asked for advice on creating
EPUB books in Node.js, I decided to publish Storyteller's EPUB
library as a standalone NPM package. I really love the EPUB spec,
and I think the Node.js developer community deserves an actively
maintained library for working with it! [1]:
https://smoores.gitlab.io/storyteller
Author : smoores
Score : 60 points
Date : 2024-12-13 19:52 UTC (3 hours ago)
(HTM) web link (www.npmjs.com)
(TXT) w3m dump (www.npmjs.com)
| hombre_fatal wrote:
| Extracting a library from a real world project is one of my
| favorite parts of software.
|
| I'm sure the march of LLMs will continue eating into this pie,
| and that's a good thing (most of it is a distraction from the
| real work), but I love polishing a library on my laptop in a
| cafe. It's like working on a painting or something.
| smoores wrote:
| It was, actually, very enjoyable! When we pulled React
| ProseMirror[1] out of the NYT text editor, it was a pretty
| laborious process that we had to careful plan and execute for
| months, and we still ended up with an internal fork for a
| while.
|
| By contrast, this was mostly just moving a file around and then
| writing documentation and cleaning up the public API. I rather
| enjoy thinking about and modeling library APIs in general, so I
| actually had a lot of fun with it!
|
| [1]: https://www.npmjs.com/package/@nytimes/react-prosemirror
| gavmor wrote:
| Cool! I totally could have used this earlier this year... can't
| remember what for...
|
| Interesting choice to publish from the storyteller "monorepo." Is
| that because it evolved in situ, and you've no impetus to incur
| the overhead of extraction?
| smoores wrote:
| Hahaha, well if it comes to you, the library will still be
| there for you :)
|
| Right, this was actually just a file within the Storyteller web
| package to start. It was fairly well defined, and so pretty
| easy to pull out into another package in the monorepo, but
| Storyteller is the primary consumer at the moment, and I want
| to be able to develop them in sync. Plus, it provides a great
| test bed for development of the library!
|
| edit: I forgot to mention that the eventual goal is to
| (hopefully!) publish this package as @storyteller/epub, along
| with any other packages that end up split out of Storyteller.
| That will probably include at least a @storyteller/synchronize
| and a @storyteller/cli.
|
| Unfortunately, someone seems to have snagged the @storyteller
| org on NPM several years ago and left it to languish without
| really using it, so I'm waiting to see whether GitHub will
| consider this squatting and transfer the org to me.
|
| I've also tried reaching out to the developer that owns the
| org, but they don't seem to have been active on GitHub or NPM
| for the past 5 years or so, and my only real strategy for
| reaching out to them was to open an issue on one of their other
| GitHub projects!
| curtis3389 wrote:
| Something I've been wondering: why do ebooks take so long to
| render? My kindle seems good at it, but opening an ebook in
| calibre/fbreader/etc can take minutes or even fail in some
| readers depending on the ebook.
| smoores wrote:
| I would guess there are multiple potential pitfalls here.
| Firstly, not all ebook formats are created equal -- Storyteller
| only operates on EPUB files, because EPUB is an open source
| format and it supports Media Overlays (read-aloud) natively. I
| can only really speak to that format, but there are others
| (MOBI, PDF, etc).
|
| An EPUB is just a ZIP archive of XML and XHTML files (plus
| other assets, like images). Partly, I suspect, because of the
| dearth of actively maintained open source projects in the
| space, and partly because of the nature of tech in the book
| publishing industry, EPUB generation software used by authors
| and publishers often messes up this spec, which means that EPUB
| readers sometimes need to have fairly complex fallback logic
| for trying to figure out how to render a book. Also, because
| EPUBs are ZIP archives, some readers may either unzip the
| entire book into memory or "explode" it into an unzipped
| directory on disk, both of which may result in some slowness,
| especially if the book has lots of large resources. The newest
| Brandon Sanderson novel, for example, is ~300MB _zipped_.
|
| Additionally, and perhaps more importantly, EPUBs (and I
| believe MOBIs as well) represent content as XHTML and CSS,
| which means that readers very often need to use a browser or
| webview to actually render the book. Precisely how they deliver
| this content into the webview can have a huge impact on
| performance; most browser don't love to be told to format
| entire novels worth of content into text columns, for example.
| curtis3389 wrote:
| Thank you so much! That's incredibly enlightening!
| smoores wrote:
| Of course! I'm hoping to have a web reader with Media
| Overlay support built in to Storyteller available in the
| next few months, along with some much needed library
| management tooling, so maybe that will be useful for you!
| I'll try to make it snappy :)
| giantrobot wrote:
| Additionally the XHTML content _can_ just be a single large
| file instead of one file per chapter /section. Paginating and
| rendering the large single file is going to be more effort
| than the same on a smaller file. This is all on top of the
| pitfalls and variability you mention.
| smoores wrote:
| Yup, great point. Especially if you've used some tool to
| convert from another file, like a PDF, into an EPUB, you
| can easily end up with the entire book in a single XHTML
| file, which, again, can be pretty heavy for a browser to
| parse and format! I also have no idea whether Calibre et al
| actually use native web views, or have their own renderers,
| which are almost certainly less performant than native web
| views!
| cyberax wrote:
| > Additionally the XHTML content can just be a single large
| file instead of one file per chapter/section.
|
| Terry Pratchett books are notorious for that. Some tools
| EPUB authoring tools artificially introduce breaks, but you
| can't rely on them.
| Yoric wrote:
| FWIW, I wrote an EPUB (well, it was called OEBPS at the time)
| reader that rendered pretty much all of the format ~21 years
| ago (including all of XHTML and CSS) and it had very decent
| performance. I seem to recall that someone tried it on the
| One Laptop Per Child XO and it was... well, slow, but it
| worked.
|
| So it's possible :)
| cyberax wrote:
| I used Storyteller to align the most recent Sanderson's novel
| on audio and the result is 1.7Gb. That's... painful. It
| resulted in it crashing the reader on Remarkable2 tablet.
|
| I'm now actually working on a Calibre-Web change to strip the
| audio and media overlay from the books it serves via OPDS.
|
| Then I'll need to tackle cross-device progress sync. This
| turned out to be surprisingly tricky.
| smoores wrote:
| You can't do much better than that; that's the size of the
| audiobook! For what it's worth, I also used Storyteller on
| Wind and Truth, and got it down to 1.2GB by using the OPUS
| codec with a 32 kb/s bitrate.
| cyberax wrote:
| Yeah. My current workaround is to create KEPUBs (Kobo-
| optimized epubs), but that creates an issue with cross-
| format reading progress sync. This is an interesting task
| in itself, though.
|
| So I'm trying to design a progress sync protocol. My
| current idea is to just use several words from the text
| itself to unambiguously pinpoint the position within a
| section (chapter).
| smoores wrote:
| Is the idea that you have some devices that you want to
| download just the text to, but have it sync with your
| other devices? I think we could support that natively,
| honestly! Storyteller already has the input files, and it
| uses a text-based position system that doesn't require
| the audio to exist. If you're already doing work on this,
| maybe we could add it to Storyteller?
| sriacha wrote:
| I find Koreader (linux version) leaps and bounds faster than
| the calibre reader.
| seanwessmith wrote:
| sometimes life just delivers. was looking for this 2 days ago
| smoores wrote:
| Outstanding. Let me know if you run into any issues!
| revskill wrote:
| Genius. Thanks.
| herrvogel- wrote:
| Storyteller seems pretty cool in general. Can it be used to host
| books for other people?
| smoores wrote:
| Thanks! Absolutely. You can invite users to your Storyteller
| server and give them whatever permissions are appropriate (e.g.
| you can choose whether they can only download books, or can
| also manage uploading and syncing books and/or managing users).
| It has SMTP support for emailing invites, or it can just
| generate invite links for you to share yourself.
|
| More info here:
| https://smoores.gitlab.io/storyteller/docs/administering#inv...
| 9dev wrote:
| Oh my! This looks very neat, and I've been working on
| something similar to Storyteller (i think):
| https://github.com/project-kiosk/kiosk
|
| I don't get around working on it right now, but maybe there's
| something useful there for you.
| cyberax wrote:
| BTW, I've had this idea in my mind for a while, after slowly
| working through my e-library infrastructure.
|
| Do you think it might be a good idea to set up a site to share
| the aligned overlays from Storyteller? This way, people won't
| have to waste CPU/GPU time re-aligning the same files over and
| over again.
|
| It should be OK from a copyright perspective, as it won't be
| distributing any copyrighted material, only the media overlay
| information.
| smoores wrote:
| That's a really interesting idea! The more I think about it,
| the more I like it.
|
| A challenge I foresee is that the media overlays are only
| reusable if you have the exact same input EPUB file, and have
| processed it with Storyteller to mark up the sentence
| boundaries. EPUBs have unique identifiers, though, so maybe
| this would be fine! We'd need to add a new processing flow to
| Storyteller, but it should be doable.
|
| Feel free to hit me up in the Storyteller chat if you want to
| discuss more! Thanks for sharing this idea!
___________________________________________________________________
(page generated 2024-12-13 23:00 UTC)