[HN Gopher] Portable EPUBs
       ___________________________________________________________________
        
       Portable EPUBs
        
       Author : sohkamyung
       Score  : 530 points
       Date   : 2024-01-26 01:53 UTC (21 hours ago)
        
 (HTM) web link (willcrichton.net)
 (TXT) w3m dump (willcrichton.net)
        
       | ijhuygft776 wrote:
       | Portable epubs? All the epubs I ever downloaded are portable...
       | not sure how it could NOT be the case... not reading an article
       | with a title like this.
        
         | zwayhowder wrote:
         | I had the same thought, but wanted to know why the Author
         | thought they were not.
         | 
         |  _For example, a major issue for self-containment is that EPUB
         | content can embed external assets. A content document can
         | legally include an image or font file whose src is a URL to a
         | hosted server. This is not hypothetical, either; as of the time
         | of writing, Google Doc 's EPUB exporter will emit CSS that will
         | @include external Google Fonts files. The problem is that such
         | an EPUB will not render correctly without an internet
         | connection, nor will it render correctly if Google changes the
         | URLs of its font files._
         | 
         | The article raises some interesting ideas. Much like PDF and
         | PDF/A, I would say an EPUB/A standard would be potentially
         | useful.
        
           | adamzochowski wrote:
           | But same font problem exists with PDFs. If font is not
           | embedded into PDF, or rendered into a vector shape that
           | embedded, then PDF will display garbage.
        
             | BHSPitMonkey wrote:
             | Isn't that solved in PDF/A, which the GP was implying could
             | also be done for EPUB?
        
               | jasomill wrote:
               | Yes: among other things, PDF/A requires all fonts to be
               | embedded.
        
           | BarbaryCoast wrote:
           | Thanks for that. I can't read the article, probably because I
           | block WASM (and Javascript) for security. None of my ebook
           | readers have Internet access (for security and for privacy),
           | so none of those internet-only epub files would work for me.
           | 
           | This might be "legal", since XHTML was intended for the web,
           | but I assume Google's using it to collect more user
           | interaction data that they can sell to data brokers.
           | 
           | FWIW, PDF is simply Postscript that's been compressed. As far
           | as I can tell, almost all documents these days are created
           | with Microsoft Word, TeX, or Postscript. I'm lumping things
           | like PageMaker and LaTeX in with the base they were derived
           | from.
        
             | Symbiote wrote:
             | The article is a WASM EPUB viewer. There's a link shown in
             | the viewer to the EPUB file:
             | 
             | https://willcrichton.net/notes/portable-
             | epubs/epubs/portable...
        
         | xtracto wrote:
         | Well... if you actually read the article instead of just the
         | header, you would learn the reason for the need of a portable
         | version.
        
           | emmanueloga_ wrote:
           | "portable EPUB: an EPUB with additional requirements and
           | recommendations to improve PDF-like portability."
           | 
           | IMO epub is fine for fiction but not for any sort of
           | technical material. EPUB docs are slow to reflow and the
           | layout is pretty much always broken in some way, specially
           | when there are tables and graphics involved. PDFs are a lot
           | faster to render and navigate, the fixed page size being one
           | of the reasons.
        
             | wolverine876 wrote:
             | The OP addresses some of those issues.
        
             | lxgr wrote:
             | Having suffered through many PDFs on my phone I'd take slow
             | reflow and sometimes broken layouts over no reflow and
             | guaranteed horizontal scrolling any time.
        
               | auggierose wrote:
               | Not sure why anyone would want to read PDFs on their
               | phone.
        
               | offices wrote:
               | Imagination error. My phone has hundreds of downloaded
               | PDFs from emails containing things such as tickets, job
               | specs, pseudo-letters, bills, etc.
               | 
               | Anything where one might wish to read a Document in a
               | Portable Format.
        
               | auggierose wrote:
               | Yeah, I guess. The thing with these is I don't care about
               | the "quality" of a bill or ticket, it's enough if the tax
               | man / concert venue accept it.
               | 
               | Many people advocating for a "better" PDF don't
               | understand the quality aspect of a PDF. I am not willing
               | to compromise on that when reading a book. It beats all
               | other aspects, including the fact that I cannot read it
               | on my phone. Basically, PDF's are a perfect translation
               | of books into the digital medium. Gimmicks and features
               | on top of what PDF can do are fine, but _never a
               | replacement_ , given that books also don't have these
               | features.
        
               | lxgr wrote:
               | I quite often find interesting research papers during the
               | day that I don't have time to read in the office, and
               | there's no stable cell signal on my commute, so it's in a
               | way the perfect reading environment for these for me.
               | 
               | My commute isn't long enough and often too crowded to
               | warrant pulling out a tablet though. Reading on a single-
               | hand device is ideal, and I prefer physical to physical
               | books for that reason. So why shouldn't I read research
               | papers the same way? I just want a portable document
               | format for an actually portable device.
        
         | SilentM68 wrote:
         | One thing I dislike about PDFs is that dark themes usually
         | don't render good, especially the embedded images, whereas the
         | EPUB format seems to render them just fine. If a new EPUB
         | format is created, I would suggest that they support
         | pagination, since post secondary courses usually ask students
         | to site chapters, pages, etc. Most EPUBs that I've come across
         | don't have pages. The last thing I'd suggest is that the new
         | standard, if created, should incorporated accessibility
         | features, so that the file is readable by screen readers. PDFs
         | are rarely designed with accessibility in mind. Making them
         | accessible is also a gigantic pain to do. The technology behind
         | any new EPUB document standard should have native accessibility
         | support by default. People with print disabilities will thank
         | you.
        
           | starkparker wrote:
           | > If a new EPUB format is created, I would suggest that they
           | support pagination, since post secondary courses usually ask
           | students to site chapters, pages, etc. Most EPUBs that I've
           | come across don't have pages.
           | 
           | From the post:
           | 
           | > I think we just have to give up on citing content by pages.
           | Instead, we should mandate a consistent numbering scheme for
           | block elements within a document, and have people cite using
           | that scheme.
           | 
           | The point of a citation is to specifically reference an
           | assertion. Any method of specifically referencing an
           | assertion works.
           | 
           | If anything, referencing by section and paragraph is more
           | portable than referencing by page number. It's more
           | consistent across different print formats of the same text
           | (hardcover vs. paperback, and mainstream print editions vs.
           | large-text or braille editions) as well as different digital
           | formats.
        
             | jxdxbx wrote:
             | this is an issue in the legal world. court opinions
             | accessed only online are cited according to their "page
             | number" in some reporter or another. it's better to cite
             | paragraph numbers when possible but most American legal
             | documents are un-numbered.
        
               | crabmusket wrote:
               | It sounds like the legal world can continue to use PDFs.
               | That's fine!
        
               | dsr_ wrote:
               | Laws already have problems with HTML: numbered lists are
               | specified in a way which is incompatible with many
               | jurisdiction's numbering schemes, including the US
               | Federal standard.
        
           | steve1977 wrote:
           | > dark themes usually don't render good
           | 
           | PDFs should not render dark themes at all. PDFs should like
           | exactly like they were produced. So if they were produced
           | with black on white text, that's what they should render, in
           | any circumstance.
        
             | Vecr wrote:
             | Zathura[0] has a dark mode[1], it works pretty well.
             | 
             | [0]: https://pwmt.org/projects/zathura/
             | 
             | [1]: ^r Recolor (grayscale and invert colors)
        
             | o11c wrote:
             | Or it can just invert the L component of all colors in the
             | HSL colorspace at the very last stage of rendering, which
             | only requires a couple subtractions to do in sRGB.
             | 
             | Unlike the unfortunately common "invert _directly_ in sRGB
             | ", this preserves the colors changing only the brightness,
             | and honestly it's pretty good. Colorspace nerds will no
             | doubt complain that there are better colorspaces available,
             | but in practice, most consumer devices implement "sRGB"
             | perceptually such that this works better than fancier
             | methods (which only work for carefully calibrated displays
             | in carefully calibrated rooms).
        
               | steve1977 wrote:
               | I didn't say PDF cannot do it, I said it should not. It
               | defeats the purpose of PDF.
        
       | wolverine876 wrote:
       | It's a very well thought through article by the developer of
       | Nota, trying to bring EPUB format up to parity with PDF. It's a
       | serious start and they've already written a viewer. In fact, the
       | article itself is displayed in a browser-based wasm port of the
       | viewer (and looks good!).
       | 
       | One issue is how precisely EPUB, which is really XHTML, can
       | reproduce layout. What are the possibilities here? The OP's
       | standard is that the document will look "reasonable". The imply
       | that HTML would need new layout capabilities to match PDF, at
       | least for line breaking:
       | 
       |  _There 's two ways to make progress here. One is for browsers to
       | provide more typography tools. Allegedly, text-wrap: pretty is
       | supposed to help, but in my brief testing it doesn't seem to
       | improve line-break quality. The other way is to pre-calculate
       | line breaks, which would only work for fixed-layout renditions._
       | 
       | Also, though the author mentions annotations, I don't see how
       | they intend to implement them.
        
         | joshjob42 wrote:
         | The author discusses fixed layout epubs. Effectively, the epub
         | can give a default pagination, line-breaks, font, font size,
         | page size, and positioning for images etc., making it render
         | identically on everything (one might optionally omit pagination
         | if opening in a browser but keep everything else). This can be
         | done already in epub3. But that's not ideal, because then it
         | doesn't look good anymore on a phone, etc. Depending on the
         | reader though, you could override the default, but then you
         | have to hope that your reader does a good job of making a nice
         | document. An alternative is for the epub to specify multiple
         | renderings, for various common screen types.
         | 
         | I don't think this is unreasonable as a solution. By all means
         | let's try to get a smart reader, but letting people create
         | defaults for their documents that can be overridden if desired
         | by the user is a good middle ground.
        
           | idoubtit wrote:
           | Indeed, EPUB3 provides all the features that the author
           | wishes. His "portable EPUB" format is just a loosely
           | specified subset. It's unclear if some extra features are
           | included in the format as they are in his "Bene" tool, like
           | the rendering of references (i.e. links with a data-target
           | attribute).
           | 
           | The EPUB3 standard is much more complex than EPUB2 (media
           | overlays, mixing fixed layout with reflowed, MathML...). In
           | my experience the implementations are much more varying, and
           | most of them aren't complete. So a "Portable EPUB" may not
           | render as expected because the reader tool lacks some
           | specific feature. The author also requires full JS support,
           | which I supose does not help with portability.
        
         | zozbot234 wrote:
         | Doesn't CSS support layout capabilities for paged media out of
         | the box? An EPUB reader just has to implement good old-
         | fashioned "Print Preview" display mode, and you're set.
        
         | zdunn wrote:
         | > Also, though the author mentions annotations, I don't see how
         | they intend to implement them.
         | 
         | It's discussed at the very end of section 8 and all of section
         | 9 that interactive functionality would use web components.
        
         | nine_k wrote:
         | PDF does not have any capabilities of line breaking. It is a
         | _picture_ format, similar to SVG, only more rigid. That 's why
         | it can't have text reflow, etc.
         | 
         | What an ebook format needs is a _semantic_ form of markup,
         | which adapts to devices it is rendered on. HTML + CSS were
         | invented for this goal.
         | 
         | With that, book layout authors should consciously relinquish
         | some control on how the book looks, and hand it to the reader.
         | Slight visual imperfections are a small price to pay for this.
         | Who needs visual perfection should go for a PDF.
         | 
         | This, of course, becomes hard if any interactive stuff is
         | involved. I would suggest that larger interactive elements
         | should open in a dedicated view when needed, and tiny
         | interactive elements should embrace reflow.
        
           | kps wrote:
           | > HTML + CSS were invented for this goal.
           | 
           | HTML (with SVG and MathML) is probably fine for most books,
           | but CSS has spent 30 years resolutely resisting basic
           | typography, i.e. default text baseline alignment.
        
           | wolverine876 wrote:
           | You will enjoy the article, which goes into these issues in
           | some detail.
        
         | BlueTemplar wrote:
         | Ironically, the very example the author uses for annotations
         | doesn't work properly for me : on touchscreen Android Firefox I
         | get a link instead of a popup when press-holding.
         | 
         | And aren't annotations (and references) already part of the
         | EPUB specification, and probably even the HTML specification
         | ?!?
         | 
         | Finally, I disagree with the press-and-hold for popup being
         | better than the usual practice of hyperlink anchors, IMHO their
         | jumping around is much less disruptive. (As long as the
         | reader's "return" function is working properly, and/or - for
         | the bijective ones - they provide a "back" hyperlink.)
        
           | wolverine876 wrote:
           | > And aren't annotations (and references) already part of the
           | EPUB specification
           | 
           | I'm pretty sure they are not, based on looking carefully a
           | year or two ago, on recent discussions here on HN, and on the
           | OP's belief that they need to invent it.
        
       | xnx wrote:
       | 8 days ago, 134 points: "Portable Web Documents - An Alternative
       | to PDF Based on HTML5 (2019)"
       | https://news.ycombinator.com/item?id=39036774
        
         | crabmusket wrote:
         | And this current post is exactly what I was wishing for in my
         | reply[1]. Really glad this was posted!
         | 
         | [1] https://news.ycombinator.com/item?id=39037135
        
       | mr_mitm wrote:
       | I'm fully on board with the author's "I want to replace PDF"
       | sentiment.
       | 
       | It's true that running code in the document has some downsides,
       | but the vast majority of people does it all the time in their
       | browsers. And it comes with tremendous upsides. Just imagine
       | large amount of data presented in interactive tables which can
       | sort, filter and export or interactive graphs inside the
       | document. We already use HTML+JS so much, why should we stop at
       | documents? Yes, they can't be printed, but in my observation less
       | and less people even own a printer these days, and I see no
       | reason why this trend should not continue. I bet the future will
       | be mostly living, interactive documents.
       | 
       | It's funny that I just mentioned this in the other thread [1],
       | but I also felt that there is a need for a format that is self-
       | contained and widely supported by standard software (by which I
       | mean browsers). A well-specified open format would be great, but
       | until then I tackled the self-containedness problem with JS and
       | wrote a Python script that zips and bundles all assets and embeds
       | them as a SPA into one HTML file [2]. The focus is on Sphinx docs
       | but it should work in general with all distributed HTML docs.
       | 
       | [1] https://news.ycombinator.com/item?id=39138444
       | 
       | [2] https://github.com/AdrianVollmer/Zundler
        
         | fodkodrasz wrote:
         | > It's true that running code in the document has some
         | downsides, but the vast majority of people does it all the time
         | in their browsers.
         | 
         | This probably has to do something with them having nothing to
         | do, as the big companies managed to convince the frontend dev
         | community that the single best thing to generate layout is on
         | the client machine on the fly. Of course they did it so the
         | users will have a hard time selectively blocking the layout
         | scripts from the ad/spyware most contemporary (web)software
         | development is about.
         | 
         | This led us to the point where saving (or God forbid printing!)
         | an article needs a lot of effort in many cases.
         | 
         | My observation is: when I need to go to work on the field, I
         | need printed documents. Printed documents don't need firmware
         | updates, their batteries don't run out, and no, I don't need
         | interactivity in documents.
         | 
         | Self contained HTML is a good - and necessary - step, but
         | interactivity and executable code is not something we usually
         | need in documents, I only saw somewhat legit need for it on
         | corporate abomination of documents (and some teaching materials
         | possibly).
        
           | morelisp wrote:
           | Some of us are equally nonplussed by modern web dev but still
           | quite miss PostScript.
        
           | jxdxbx wrote:
           | Thank you. It's really frustrating that people want to make
           | documents as unreliable and annoying as the web.
        
         | jxdxbx wrote:
         | I understand all this but there needs to be a simple format for
         | just regular books without all this complication. I thought
         | that's what ePubs were for. What I want it basically an ebook
         | format that is mostly zip files of plain text.
        
           | mr_mitm wrote:
           | What are you missing in epub?
        
             | jxdxbx wrote:
             | Simplicity? A guarantee that the file will be readable in
             | 20 years? Project Gutenberg still treats plain text as the
             | default format for a reason.
        
               | velcrovan wrote:
               | The ePub standard is 17 years old, it consists of HTML
               | which is 31 years old and CSS which is 27 years old,
               | packaged in ZIP format which is 34 years old, and all are
               | still in widespread active commercial use and very easy
               | to write parsers for. I think you'll have problems with
               | the physical media you use to store your plain text files
               | before you ever have problems finding software to read
               | ePub file contents.
        
         | larme wrote:
         | It's just a fucking book. Don't push your shits like js or SPA
         | or d3 or webgpu to a fucking book. I just want to read it like
         | a dead tree book.
        
           | mr_mitm wrote:
           | Not all documents are books. And I'd appreciate it if you
           | stated your criticism in a more civilized manner.
        
       | simonw wrote:
       | Here's some really insightful feedback on this idea from Baldur
       | Bjarnason, who has spent significant time working with various
       | W3C groups relevant to EPUB:
       | https://toot.cafe/@baldur/111819472053623911
       | 
       | Example note: "EPUB originally didn't support remote resources
       | and people put a lot of work into changing. Loading stuff over
       | the network is HTML's killer feature. Blocking network assets is
       | a setback for format adoption, not progress."
        
         | starkparker wrote:
         | Oh for Christ's sake, someone pry Baldur off his cross again.
         | 
         | > Blocking network assets is a setback for format adoption
         | 
         | People are standing here telling him that _allowing_ network
         | assets _is a setback for format adoption_ and he's just going
         | to keep pounding this stupid, obnoxious drum of his until he
         | runs everyone off.
         | 
         | > almost all of the problems described would be solved by
         | getting OS vendors (Google, MS, Apple) to invest more money in
         | EPUB
         | 
         | That's back-asswards. Google, MS, and Apple don't give a shit
         | about EPUB, they never will, and it's arguable that we're
         | better served with them not buying a seat at that table
         | considering how poorly their "help" has helped web standards,
         | as much of the rest of his dismissive thread helpfully notes.
         | 
         | If he wants money for EPUB standards he should shake the cup at
         | IDPF members who rely on it, and particularly Amazon, to whom
         | he quite vocally abdicated the publishing space 12 years
         | ago.[1]
         | 
         | Barking at operating system companies is nonsense at best and
         | how we wind up with another, even more avoidable situation
         | where the space is held hostage by them at worst. At least
         | Amazon can chuck some goodwill money at EPUB development while
         | continuing to kick its ass up and down the market with MOBI.
         | 
         | (Aside from all this, his dismissals of the "clunky" reading
         | system complaint, citing how EPUB has "too many divergences",
         | only further proves to me how tunneled the vision is of the
         | people involved. To hell with forking or improving EPUB, then,
         | because it can't be improved if that's the attitude of the
         | people most involved with or influential within it. What bloody
         | point is there in the customizability of a format that _nobody_
         | can effectively build tools for or consume?)
         | 
         | 1: https://www.baldurbjarnason.com/notes/amazon-wins/, in which
         | he also admits that he has no idea how to work with IDPF, which
         | is a really great sign of how long things have been going this
         | badly in this space
        
           | sirsinsalot wrote:
           | I also don't want my e-reader phoning home (to
           | publisher/author) read time and page turns because the EPUB
           | loads a pixel.
        
           | Someone wrote:
           | > Google, MS, and Apple don't give a shit about EPUB
           | 
           | https://www.w3.org/groups/wg/epub/former-participants/
           | certainly shows Google and Apple participated in the working
           | group. Apple also has a book store selling EPUB books and has
           | EPUB readers for both MacOS and iOS. Google also has an app
           | that handles EPUB.
        
           | crabmusket wrote:
           | I really enjoyed this response, though I feel your points
           | could have been made with a little less personal vitriol. Do
           | you have history with Baldur? I don't ask because I think
           | that would undermine your arguments, I'm just interested why
           | you had such a strong reaction.
        
           | mft_ wrote:
           | > > Blocking network assets is a setback for format adoption
           | 
           | > People are standing here telling him that _allowing_
           | network assets _is a setback for format adoption_ and he's
           | just going to keep pounding this stupid, obnoxious drum of
           | his until he runs everyone off.
           | 
           | I don't know the background that you're frustrated about, but
           | I'd suggest that the answer might be: 'it depends' - and it
           | depends on the intended purpose of the format in question.
           | PDF is self-contained, and can be read (mostly) reliably on
           | almost any device with the right software; PDFs having to
           | have internet access to be read or opened would be a bad
           | thing; further, the same goes for most formats - including
           | EPUB (as you say) and audio files, picture files, etc.
        
             | chasil wrote:
             | > PDF is self-contained, and can be read (mostly) reliably
             | on almost any device with the right software
             | 
             | Article: "A PDF is a single file that contains all the
             | images, fonts, and other data needed to render it."
             | 
             | This is only true if you are using PDF/A, or have
             | explicitly bundled all of your fonts in some version of the
             | PDF standard.
             | 
             | Otherwise, 14 total typefaces must be rendered by the
             | viewer. These 14 are: Times (in regular, italic, bold, and
             | bold italic), Courier (in regular, oblique, bold and bold
             | oblique), Helvetica (in regular, oblique, bold and bold
             | oblique), Symbol, and Zapf Dingbats.
             | 
             | The 14 standard typefaces can vary between viewers:
             | 
             | https://en.wikipedia.org/wiki/PDF#Text
             | 
             | "...the base fourteen fonts... or suitable substitute fonts
             | with the same metrics, should be available in most PDF
             | readers, but they are not guaranteed to be available in the
             | reader, and may only display correctly if the system has
             | them installed."
             | 
             | Depending upon what the viewer bundles, PDFs using these 14
             | might not render as expected.
             | 
             | Below is a deeper discussion from the wiki:
             | 
             | https://web.archive.org/web/20110718231502/http://www.plane
             | t...
        
           | dsr_ wrote:
           | It would be quite nice if Firefox opened EPUBs properly
           | instead of requiring the just-good-enough EPUBreader add-on.
           | 
           | I'd value that a lot more than Pocket (which I always turn
           | off).
        
           | the_lucifer wrote:
           | I will go so far after argue that Apple is only one of the
           | major vendors actually adopting EPUB books.
        
         | zaphirplane wrote:
         | Allow loading of network resources is not good for security.
         | Surprised this isn't a worry, tbh didn't read baldur's writing
        
           | criddell wrote:
           | But it's good for tracking what books people are reading. If
           | the history of the internet shows anything, it's that if
           | surveillance is possible, it will eventually happen.
           | 
           | Lots of governments around the world would love to know what
           | their citizens are reading. Few would be bold enough to go
           | after this directly, but if some company operating in their
           | country has the data then there's a path for that government
           | to get the data.
        
         | watwut wrote:
         | Just about the last thing I want is for epub to stop working
         | offline on my phone, because the damm book needs to download
         | something.
        
         | jxdxbx wrote:
         | My view as a heavy ebook reader: ePubs should be inert data. No
         | javascript, no interactivity, no network resources. Just a
         | fancy text file with some appearance settings all of which the
         | reader can override.
        
           | harshreality wrote:
           | Think about what javascript exclusion means, and all the
           | things a good universal ebook format needs to support.
           | _Nicely_ rendered math? Currently the best option to do that
           | is embedded mathjax (maybe you could pre-build mathml and
           | ship that, but I 'm not sure that covers all cases). Graphs
           | or charts? There are nice js libraries for that, while doing
           | it manually means exporting images or svgs. Even static svgs
           | are annoying and brittle to font-size changes without
           | javascript to adjust the svg size appropriately.
           | 
           | Don't confuse what's necessary for standard fiction books
           | with what the format should support.
           | 
           | JS and interactivity are fine, in technical books, reports,
           | or niche fiction.
           | 
           | What I absolutely agree on is that epubs don't need is
           | networking. Resources on the internet get stale after years
           | or decades anyway, so inclusion of any network assets into an
           | epub guarantees that the work will degrade over the years.
           | References can be web links, but nothing from the internet
           | should be embedded.
        
             | criddell wrote:
             | EPUB 3 includes MathML.
             | 
             | https://www.w3.org/TR/epub-33/
        
         | bmacho wrote:
         | > "EPUB originally didn't support remote resources and people
         | put a lot of work into changing. Loading stuff over the network
         | is HTML's killer feature."
         | 
         | And it is a feature of books that they stay the same.
         | 
         | Both can be a feature, the ability to change (e.g. they fix
         | something in the cloud), and the disability to change (e.g. you
         | can have it as you bought it).
        
           | criddell wrote:
           | I don't get the impulse for homogenization everywhere. PDFs,
           | EPUBs, Word documents, HTML documents, etc... all have
           | different strengths and weaknesses and I think that's a good
           | thing. Never needing an internet connection is a strength of
           | EPUB IMHO.
        
       | reacharavindh wrote:
       | The thing I wish the most with epub or technically the epub
       | readers is the ability to scribble and hand write notes in them
       | using a stylus and for them to keep them while reading again. I
       | do that with PDFs on my iPad, but have a lot of tech books for
       | which I took manual notes nowhere to be found again - even if I
       | did, they are not inline with what I was reading and thinking.
        
         | eviks wrote:
         | In general, the modern docs should be easily editable, not just
         | allow annotations, since it's easy to preserve the original
         | content/layout
        
         | beckerdo wrote:
         | I agree, I would like an ePub to have a robust note taking and
         | exporting ability.
         | 
         | For instance, if I highlight in Chapter 8 "In 539" [next
         | paragraph] "Belisarius" [next paragraph] "marched on Ravenna"
         | [10 paragraphs later] "In 540 Belisarius entered Ravenna".
         | 
         | I would like to export this with the Chapter header and
         | detailed highlight locations OR just as one sentence with
         | subtle links to the locations.
        
         | AlanYx wrote:
         | I'm on the same page. I convert all my ePubs to PDF because I
         | want to keep my handwritten annotations in-place alongside the
         | text I'm annotating, including things like circled words.
         | Recent Kobos (Elipsa and Sage) take a decent stab at solving
         | this problem while retaining the ePub format and
         | reformattability, but it breaks too easily.
        
       | eviks wrote:
       | Commendable effort of trying to get rid of the ancient paper-
       | based legacy in the digital world that is PDF
       | 
       | Though I'm curious whether the clunky old-but-still-living HTML
       | (especially in its ugly XML variety) + CSS are the right
       | foundations for the portable format of the future? Since the
       | author has also developed the whole new document language would
       | be nice to read a more in-depth overview on that subject. Or why
       | limit to the ugly duckling of JS in the future when WASM exists?
       | 
       | > content by pages. Instead, we should mandate a consistent
       | numbering scheme for block elements within a document, and have
       | people cite using that scheme.
       | 
       | that's indeed the proper and more precise approach, though we
       | could still have those "fixed layout epub" pages as a backup
       | coordinate system
        
       | thayne wrote:
       | > A PDF is a single file that contains all the images, fonts, and
       | other data needed to render it.
       | 
       | A PDF _can_ include the fonts. But it often doesn 't, and relies
       | on system fonts. One reason for that is because including fonts
       | in the PDF can dramatically increase the size of the file. In
       | some cases a single font could be larger than the entire rest of
       | the file. I've also worked on implementing embedding fonts in
       | some software that generated PDFs. It was surprisingly difficult
       | to figure out how to get it to work reliably.
       | 
       | > PDFs are rendered consistently.
       | 
       | Not as much as you would think. There are several cases where the
       | same PDF will render differently depending on which PDF viewer
       | you use. Usually the differences are pretty subtle, but
       | occasionally there are edge cases that result in pretty
       | significant differences. I've even run into a case where the same
       | version of Acrobat reader will render a PDF differently depending
       | on what OS you are using.
        
         | EE84M3i wrote:
         | Is there software that minimizes the fonts by removing code
         | points that aren't used in the document?
        
           | adrian_b wrote:
           | This is a standard feature of the PDF format.
           | 
           | Normally all PDF documents include only the glyphs
           | corresponding to the code points actually used in the text
           | rendered with that font.
           | 
           | That is why you can go for instance to any site of a vendor
           | of fonts and you can download freely a PDF sample text of an
           | expensive font. You can easily extract the font from the
           | sample PDF, but it will be useless, as it will contain only
           | the few letters that had been used in the sample text.
        
         | geraldhh wrote:
         | > A PDF can include the fonts. But it often doesn't, and relies
         | on system fonts.
         | 
         | found this out, after 20-something years of consistent pdf
         | renderings, in a job interview because my docs allegedly looked
         | odd :/
         | 
         | the daily wtf ...
        
           | jxdxbx wrote:
           | Yeah, MS Office PDF generation (at least some time ago) did
           | not generate PDFs with embedded fonts, and I'd often come
           | across weird-looking documents where the system is using a
           | font with the wrong characteristics. Print-to-PDF usually
           | avoids this.
        
           | pseingatl wrote:
           | The Arabic glossary of legal terms distributed by the State
           | of California is unreadable unless you open the file in Adobe
           | Reader, search for the name of the font used, download and
           | install the font on your system, close the file and reopen
           | it. I suppose there are many instances of this happening.
        
         | adrian_b wrote:
         | There are only a few standard system fonts that can be omitted
         | from a PDF file and the document assumes that whatever fonts
         | will be used for rendering match in metrics the traditional
         | Times, Helvetica, Courier, etc., typefaces. Therefore with
         | compatible system fonts there should be no changes in the
         | layout of the rendered document. There are of course examples
         | of system fonts which are advertised as compatible in metrics
         | with the ancient Adobe PostScript fonts, but which nonetheless
         | have subtle differences.
         | 
         | Except for the small number of standard system fonts, for the
         | other fonts the PDF document normally includes only a small
         | subset of their glyphs, corresponding to the characters that
         | are actually used in the text that is to be rendered with that
         | font.
        
       | teekert wrote:
       | As someone who occasionally tries to read scientific literature
       | on their e-reader, which is nice, I can just mail it to my
       | PocketBook account and it shows up, I have a deep hate for PDF.
       | Please let this be a popular thing.
        
       | mrich wrote:
       | Ironically this did not render in Firefox on Android (just the
       | spinner kept spinning) Worked in Chrome.
       | 
       | That said, epubs are great for reading books on mobile. The
       | advantage for pdfs is that they contain highlights/notes, so you
       | can directly import them into Zotero and all your annotations are
       | there. For epub, you have to hope there is a way to export the
       | annotations that are stored by the reader app, and then you have
       | to process them further. Readera is a great reader for mobile
       | that makes this possible. I'm currently working on a script that
       | will convert an epub to pdf, extract the annotations from
       | Readera, and mark them in the pdf. Then I can import the pdf into
       | Zotero, while still retaining the great reading experience of
       | epubs.
        
         | Symbiote wrote:
         | Works fine in Firefox for Android 122.0 for me.
        
           | mrich wrote:
           | Also loads instantly for me now, didn't make any changes.
        
         | zozbot234 wrote:
         | There is a Web Annotation standard that could be used to export
         | the notes to.
        
         | staz wrote:
         | It is working for me on my Firefox on Android.
         | 
         | One of the nice benefits I can already experience in his
         | document it the working TOC sidebar which allow navigation in
         | the document. (Compared to classical HTML not PDF)
        
         | mwilliamson wrote:
         | I had a similar problem loading the page on Firefox for desktop
         | with private browsing. It turns out service workers don't work
         | in private browsing, which it seems Bene (the software
         | rendering the page) requires. Switching to a normal Firefox
         | window solved the problem.
        
       | DeathArrow wrote:
       | I didn't know that EPUB is based on HTML. I always had the
       | impression that it has its own binary format.
       | 
       | Using HTML as a base has a lot of sense.
        
         | simongray wrote:
         | W3C standards basically always build on top of other existing
         | W3C standards.
        
         | anthk wrote:
         | It's just a zip file. Under Linux/Mac/BSD you can trivially
         | write a script which unzip's and outputs the ebook's HTML files
         | into a large text stream and that output can be used as the
         | input of a text mode web browser, allowing you to read ebooks
         | everywhere with just two lines of code.
        
         | arp242 wrote:
         | It's just a zip file with HTML documents and some (ePub-
         | specific) XML files to define metadata, chapters, and a few
         | things like that. I use this "epub-edit" script to edit them:
         | #!/bin/zsh       #       # Extract epub file to a temp
         | directory, launch shell to edit it, and re-zip       # it.
         | Nothing about this is really epub-specific as such.       echo
         | " $@" | grep -q -- ' -h' && { sed '1,2d; /^[^#]/q; s/^# \?//;'
         | "$0" | sed '$d'; exit 0; }  # Show docs       [
         | "${ZSH_VERSION:-}" = "" ] && echo >&2 "Only works with zsh" &&
         | exit 1       setopt err_exit no_unset no_clobber pipefail
         | full=$1:a              tmp=$(mktemp -d)       bsdtar xf $1 -C
         | $tmp              cd $tmp       print "Editing $1; press ^D to
         | exit"       zsh ||:              mv -f $full $full.orig
         | zip -f $full *       cd -       rm -r $tmp
         | 
         | And then I use vim to edit the HTML files and such.
        
       | emayljames wrote:
       | The download of the page epub dispays out of the viewport on
       | google books app.
       | 
       | Bene seems to be in alpha stage.
        
       | diebeforei485 wrote:
       | I personally think PDF's are a terrible legacy format with
       | unnecessary complexities[1] and most uses of PDF's do not involve
       | printing so the typesetting arguments don't make sense to me. For
       | the vast majority of use cases it's far more important to be
       | readable on phone, tablet, and computer.
       | 
       | I was surprised when the author mentioned iBooks doesn't support
       | scrolling view, so I tried it myself. Turns out iBooks on macOS
       | does not support scrolling for ePub files, but it does on iOS and
       | iPadOS. Very strange decision by Apple.
       | 
       | 1. https://googleprojectzero.blogspot.com/2021/12/a-deep-
       | dive-i...
        
         | baq wrote:
         | but but but... if I really need to print something, a PDF is
         | the most reliable times portable route. I guess a multipage svg
         | would work, too, maybe, if exported to a pdf to properly print
         | multiple pages first. (Looking at you, inkscape...)
        
         | adrian_b wrote:
         | PDF is an annoying specification, but there exists absolutely
         | no replacement for it.
         | 
         | I have never seen any kind of technical documentation published
         | in any other format than PDF that is comfortable for reading
         | and searching, even when that is done on a mobile phone.
         | 
         | I do not want a document that changes appearance depending on
         | the device used for reading or depending on its temporary
         | state, like window size. I want a document whose layout has
         | been well conceived by its author and which is fixed,
         | regardless of what I happen to use for reading it.
         | 
         | When I happen to read it on a smaller screen or window, except
         | for trivial text-only documents, I do not want changes in
         | layout, but I only want a smart reader, with comfortable means
         | for fast zoom and pan, and which does not have stupid behaviors
         | (like some Android readers), for instance where scrolling
         | vertically (including Page Up/Page Down) also moves the
         | document horizontally (preventing the easy reading of a column
         | of text).
         | 
         | The traditional recommendations for the maximum width of a text
         | column are good enough, if observed, to ensure comfortable
         | reading even on a mobile phone. Only when the author breaks the
         | traditional typographic rules by making extra-wide columns, the
         | reading on a mobile phone becomes inconvenient.
        
           | broscillator wrote:
           | I find reading PDFs on my phone and even on my kindle really
           | uncomfortable.
           | 
           | On my phone I have to either zoom in or turn on landscape
           | mode (which usually means turn it on globally, I can't do it
           | _just_ for the reader app).
           | 
           | On kindle, a full page has too small font due to so much
           | margin, and fitting the width shows me 80% of the page, and
           | then I have to scroll down for the last 20% and my eyes have
           | to find where exactly I was reading.
        
             | baq wrote:
             | I'm keeping my not-sure-how-old iPad 5 around specifically
             | because it's _the_ device form factor to read pdfs.
        
               | broscillator wrote:
               | That kind of highlights how non-versatile PDF is despite
               | some comments.
               | 
               | However it does sound handy, I kinda want a dedicated
               | tablet for sheet music.
        
               | baq wrote:
               | You're absolutely right PDFs are super rigid, but that's
               | kinda their point - so with the proper device, like a
               | sheet of paper or a 10+ inch tablet screen it makes
               | sense.
               | 
               | Would I prefer more content to be reflowable etc.? Yes -
               | but with a tablet it isn't strictly necessary, just nice
               | to have.
        
               | rchaud wrote:
               | It's plenty versatile. Not everything needs to be phone-
               | friendly. Phone screens weren't designed for reading PDF-
               | size documents. Even so, options exist to reflow the
               | text, view in landscape or pan and zoom.
        
               | broscillator wrote:
               | There is one device which fits PDFs well, an ipad. It can
               | be fairly awkward on laptops and deskptops as well.
               | 
               | > view in landscape or pan and zoom.
               | 
               | This is awkward, that's the issue I mentioned above, how
               | annoying is to have to do that if you're reading for a
               | 30-60 minute session.
        
               | pseingatl wrote:
               | Or the Kindle DX, RIP.
        
           | crabmusket wrote:
           | > I have never seen any kind of technical documentation
           | published in any other format than PDF that is comfortable
           | for reading and searching, even when that is done on a mobile
           | phone.
           | 
           | Can you provide an example of what you mean? My experience is
           | completely the polar opposite.
        
             | adrian_b wrote:
             | I refer to something like a 3000 page manual of some
             | microcontroller, or the datasheets of some integrated
             | circuits or the specifications of some Arm architecture
             | variant, or the standards for some programming language,
             | e.g. C++ or System Verilog.
             | 
             | These are concrete examples of documents that I might have
             | read during some flights or when waiting for some flight,
             | on a smartphone.
             | 
             | When reading something like a fiction novel, reflowing the
             | text based on the window width may be acceptable.
             | 
             | On the other hand, the navigation through a huge document
             | half of which are tables, figures, diagrams, schematics and
             | graphics is extremely painful when it is in HTML format so
             | the layout changes based on the device and window used and
             | there are no means to jump quickly e.g. to page 1436, then
             | to page 2117. When zoom, pan and scroll are correctly
             | implemented, which unfortunately happens seldom, they are
             | much less distracting than the random changes in page
             | layout caused by rendering as done by a browser.
             | 
             | I strongly dislike whenever a company provides only a Web
             | documentation that is hard to navigate, instead of also
             | providing a PDF manual.
             | 
             | Web documentation may be acceptable for very small
             | documents, but not for most of the current technical
             | documentation, where many thousands of pages for a manual
             | are common.
             | 
             | Perhaps an EPUB format extended with everything necessary
             | to completely describe a fixed page layout might become
             | competitive with PDF, but I will have to see an example to
             | believe it.
             | 
             | For now, whenever I see a book or any other document both
             | in PDF and in EPUB formats, I always choose the PDF
             | variant, because without exception it provides a better
             | quality of the rendered pages.
        
               | crabmusket wrote:
               | I accept your points and agree that the kind of
               | documentation you're thinking about sounds like a poor
               | use case for HTML/EPUB. I do not regularly encounter this
               | sort of documentation.
               | 
               | I've been boosting the idea in the OP, but more for
               | things like "your local council's meeting minutes" or
               | "your English class assignment" or "a research paper".
               | 
               | Though I do want to point out that even moderately
               | complex specs, when designed for the web, can work well.
               | For example, the HTML spec doesn't reference page
               | numbers, but has extensive internal hyperlinking:
               | https://html.spec.whatwg.org/
               | 
               | > Perhaps an EPUB format extended with everything
               | necessary to completely describe a fixed page layout
               | might become competitive with PDF
               | 
               | I highly doubt this will ever happen, for use cases which
               | require fixed layout. But there are plenty of use cases
               | where fixed layout is unnecessary and inferior.
        
               | lxgr wrote:
               | I work with the same type of documents regularly, and I'd
               | give up both exact referencing and stable rendering in a
               | heartbeat in exchange for something reflowable that I can
               | reliably search in and copy paste from.
        
               | adrian_b wrote:
               | The PDF documents allow reliable search and copy/paste,
               | but unfortunately only when the author of the document
               | has taken care to ensure this. Nevertheless, this usually
               | happens automatically when the PDF has been created by
               | exporting a document created with some Office suite,
               | unless the author has changed the default options to
               | forbid these features.
               | 
               | Even many of the PDFs created by scanning printed
               | documents allow reasonably reliable search/copy/paste, if
               | they had been processed by an OCR.
        
               | lxgr wrote:
               | > The PDF documents allow reliable search and copy/paste
               | 
               | Are you sure about that? As far as I understand,
               | extracting text from an ultimately vector-graphics-like
               | PDF heavily depends on ORC-like heuristics on the PDF
               | consumer's side.
               | 
               | The ToUnicode mapping table can help with the glyph-to-
               | codepoint mapping aspect of this, but figuring out the
               | difference between the gap between two letters and two
               | words seems hard.
               | 
               | I've seen bothtypesofissues mentioned in the following
               | article i n t h e p a s t, including in a specification
               | document I use multiple times per day for my job:
               | 
               | https://web.archive.org/web/20220328102205/https://filing
               | db....
        
               | adrian_b wrote:
               | I did not look at the details of the PDF specification,
               | but I have heard that there are indeed many cases that
               | can confuse a PDF reader which wants to find or copy a
               | text string.
               | 
               | Nevertheless, I have been using very frequently every day
               | for many years search and copy + paste from PDF documents
               | without any problem. I usually prefer to use mupdf as the
               | PDF reader, because it is very fast (it also works better
               | as an EPUB reader than the other EPUB readers that I have
               | tried), but there are some seldom-encountered PDF files
               | that mupdf cannot parse, in which case I fall back to
               | other PDF readers, e.g. okular.
               | 
               | The only case that I encounter when search/copy/paste
               | does not work is in scanned books that have not been
               | OCR'ed, so they contain only bitmap images of the pages,
               | without text.
               | 
               | The problems mentioned at your link are caused mostly by
               | the PDF specification being too permissive, which allows
               | abuses like using a non-standard character encoding
               | coupled with the use of a non-standard font. However,
               | this specific type of abuse could not be prevented by any
               | specification without using some sort of AI to decide
               | whether the glyph used for a character encoded as Unicode
               | "A" is really a kind of "A".
               | 
               | Among the problems enumerated at your link, I have
               | encountered a few times the case when there are thin
               | spaces inserted between each letter of a string. In such
               | a case it is annoying to remove those spaces after
               | pasting the text in another document, but this is
               | something that I have seen only very rarely.
        
               | Shorel wrote:
               | And between PDF and EPUB, I always choose the EPUB
               | variant, because in my laptop it definitely looks better,
               | with the text the right size and sane pagination.
               | 
               | I don't jump to page 2112, I use the table of contents to
               | jump to section 3.1.2, which is as fast if not faster.
        
         | jxdxbx wrote:
         | people don't want fixed layout documents only for printing.
         | they want them because they want to fix the layout of their
         | documents more than they care about small screens.
        
           | crabmusket wrote:
           | Those people are welcome to continue using PDFs, and I really
           | hope that in some utopian future that they will receive a lot
           | of requests from their readers along the lines of 'can I
           | please have a portable epub version too?'
        
         | BlueTemplar wrote:
         | Not sure about this specific case, but I suspect at least some
         | of these readers might do it for consistency with e-paper
         | devices, where no scrolling is an hardware limitation (very low
         | refresh rates, weak processor, battery savings).
         | 
         | So it seems to be a bad idea to try to have a one-size fits all
         | standard : we're much better off with two digital document
         | standards : one with full multimedia and interactive
         | capabilities (short of networking), and another, a subset of
         | the previous with the limitations like : monochrome, no
         | multimedia, interactivity mostly limited to (still in-document)
         | hyperlinks...
         | 
         | And guess what, we already have two formats that are _almost_
         | there ! HTML (see also MHTML=EML) and EPUB.
         | 
         | (And of course a 3rd one for physical archival and the rare
         | digital fixed layout documents, for which PDF/A already seems
         | to be decent enough.)
        
       | czierleyn wrote:
       | When the IDPF merged with the W3C a couple of years back they
       | tried to develop a new standard called PWP, Portable Web
       | Publications, which was supposed to be a new 4.0 version of EPUB,
       | as far as i know. But there was much resistance from the
       | publishing community and the project was shelved a couple of
       | years ago.
       | 
       | See: https://w3c.github.io/dpub-pwp/publishing-
       | snapshots/FPWD/Ove...
        
       | watwut wrote:
       | > PDFs are rendered consistently. A PDF specifies precisely how
       | it should be rendered, so a PDF author can be confident that a
       | reader will see the same document under any conditions.
       | 
       | And that is why PDF sux for reading on the phone. And why epub is
       | massively better if you want to read articles and books.
        
       | geokon wrote:
       | It feels like a bit of doomed a project simply bc browsers don't
       | open EPubs. You can link a PDF and while it's a bit of a context
       | switch, the browser will open and display it
       | 
       | Since as described EPubs are basically HTML its kinda dumb
       | browsers don't open them - but good luck convincing the
       | Chrome/Mozilla bureaucrats
       | 
       | I think another discouraging aspect is HTML CSS are so huge and
       | bloated at this point that few people can implement a "reader"
       | for EPUB/HTML. It's basically "go implement a new browser". It
       | makes one think a easy-to-parse markdown (like Djot) with some
       | extra rendering bells and whistles would be a more likely long
       | term solution
       | 
       | My personal interim compromise solution is embedded everything
       | (CSS, svgs, scripts and base64 images) into an HTML file. It's
       | similar to an EPUB. It's a bit bloated and ugly but with a bit of
       | care it works and naturally browsers (and by extension basically
       | every user) can open it
       | 
       | Unfortunately a user has no way to really know "oh I can download
       | and store this web page offline". It'd be nice to have some thing
       | like a .htmls extension that indicates it's an HTML but it
       | doesn't have any external resources.
        
         | zerof1l wrote:
         | > It feels like a bit of doomed a project simply bc browsers
         | don't open EPubs.
         | 
         | Not that long time ago, browsers could not open PDFs as well.
         | Now all browsers come with PDF reader written in ASM/JS. I see
         | nothing that prevents browsers doing the same for EPUBs. There
         | exist browser extensions that do exactly this already. Its a
         | matter of EPUB format gaining popularity.
        
           | geokon wrote:
           | I think Google is actively against offline data. It's not
           | aligned with their business interests
           | 
           | My mental analogy is, you can also have offline apps on
           | Android. You can specify this in app manifest. But internet
           | access isn't exposed to the user as a permissions.
           | 
           | Like the author says, Google already injects online fonts
           | into the EPubs they generate. Meanwhile PDF is a battle
           | they've already lost
        
             | BlueTemplar wrote:
             | Indeed, and I would add that there's no reason for browser
             | to be able to open PDFs : this sounds as yet another
             | attempt for Google to wrestle with Microsoft over the
             | control of the OS (by having everything happen in the
             | browser instead).
             | 
             | And also probably why we _still_ have to rely on 3rd party
             | hacky browser extensions to be able to save web pages as a
             | single file.
        
             | Shorel wrote:
             | > Like the author says, Google already injects online fonts
             | into the EPubs they generate.
             | 
             | I didn't notice that before, but now I will actively avoid
             | Google generated EPUB files.
        
           | rchaud wrote:
           | Epubs usually are not accessed/downloaded in a browser. PDFs
           | definitely are, as they are freely shared online, whereas
           | epubs are usually DRM'ed and not freely shared.
        
         | leoedin wrote:
         | > I think another discouraging aspect is HTML CSS are so huge
         | and bloated at this point that few people can implement a
         | "reader" for EPUB/HTML. It's basically "go implement a new
         | browser". It makes one think a easy-to-parse markdown (like
         | Djot) with some extra rendering bells and whistles would be a
         | more likely long term solution
         | 
         | This feels like the biggest hurdle to me. The author says
         | "Portable HTML generation principle: when possible, systems
         | that generate portable EPUBs should output portable HTML.". I
         | don't think this is going far enough. If the goal is for this
         | format to be everywhere and repeatable then it needs to be
         | standardised and easy to implement a new rendering engine.
         | Relying on webviews doesn't feel like the way forward. The
         | beauty of PDF is that it is incredibly reliable - a PDF from a
         | decade ago still renders the same today as it used to.
         | 
         | I suspect if an effort like this is to get off the ground, the
         | scope of the document needs to be scaled right back. The subset
         | of XHTML allowed should be very limited. The ability to render
         | a document that looks the same everywhere should be prioritised
         | - fixed layout at a fixed page size first, reflowable second.
         | It needs a standard with a comprehensive test suite of
         | documents + render outputs.
        
           | crabmusket wrote:
           | > The ability to render a document that looks the same
           | everywhere should be prioritised
           | 
           | IMO actually this is the question the whole effort hinges on.
           | 
           | If the goal is to replace PDF for the uses that require
           | pixel-perfect rendering on every client just as the designer
           | intended, then this approach is dead-on-arrival.
           | 
           | But if that's not the goal, then that has to be extremely
           | well-communicated by the project, so that people who need
           | that know they need to stick with PDF. Indeed, the project
           | needs to explicitly say that it's _not_ a goal, and that
           | clients _should_ be free to make reasonable rendering
           | decisions within certain specified bounds.
        
         | jbverschoor wrote:
         | You know.. HTML used to be hyper _text_. Some links. Add some
         | figures /images, tables.
         | 
         | But then we 'needed' magazine-like design/layout. Still, it was
         | document based, so actually pretty good.
         | 
         | After that, we tried to shoehorn HTML to an application
         | distribution platform. Current css layouts are (finally) more
         | like traditional layout engines for applications.
         | 
         | The last 20 years i.m.o. was pretty much a waste of effort
         | because there was no proper way to distribute (cross-platform)
         | applications, well besides java...
        
           | foofie wrote:
           | > You know.. HTML used to be hyper text. Some links. Add some
           | figures/images, tables.
           | 
           | It still is.
           | 
           | > But then we 'needed' magazine-like design/layout. Still, it
           | was document based, so actually pretty good.
           | 
           | Styling is not handled by HTML. It's a separate concern
           | assigned to CSS. For convenience HTML offered default
           | styling.
           | 
           | > After that, we tried to shoehorn HTML to an application
           | distribution platform.
           | 
           | It's not shoehorned. It's the use case: render documents. A
           | document is a tree of ui elements. It's the same with GUI
           | frameworks like Qt or WPF.
        
             | mapreduce wrote:
             | > "It still is."
             | 
             | It still is but you are missing the point of the thread.
             | HTML still is hypertext, some links, some images, some
             | tables. No doubt about that. But HTML is also so much more
             | than that. The spec is a beast. Anyone who wants to
             | implement an HTML based reader has a mammoth task in front
             | of them. It's like "go implement a new browser" like
             | someone said in this thread above.
             | 
             | > "Styling is not handled by HTML. It's a separate concern
             | assigned to CSS."
             | 
             | Missing the point again. We know styling is not handled by
             | HTML. The point of the thread was to tell how big of a task
             | it is to create your own HTML based reader. If you want to
             | create your reader like it or not you have to implement
             | support for CSS too and that too is a mammoth spec.
             | 
             | So our only options are: A. Go implement a new browser. B.
             | Use something like Webkit. C. Implement a small subset of
             | the HTML and CSS specs.
        
               | foofie wrote:
               | > The spec is a beast. Anyone who wants to implement an
               | HTML based reader has a mammoth task in front of them.
               | 
               | That's true for basically any non-trivial document
               | rendering format. For example, take a look at the PDF
               | spec. Even basic things like parsing the document format
               | is a formidable task. HTML in comparison is a trivial
               | format. The same goes for technologies like TeX or even
               | Microsoft's own Word format, which Microsoft famously had
               | lots of problems supporting. It is a hard problem for all
               | formats, not just HTML.
               | 
               | > The point of the thread was to tell how big of a task
               | it is to create your own HTML based reader.
               | 
               | You're confusing some things. A document format is one
               | thing, but a renderer with specific capabilities is an
               | entirely different thing. You're commenting on the
               | document format and the styling and layout system, and
               | now you're shifting the conversation to what it takes to
               | implement a renderer.
               | 
               | Debates on document formats are entirely separate and
               | orthogonal to debates on how to implement renderers.
               | Renderers for the most trivial things are tremendously
               | complex. There are a myriad of good reasons why we're
               | seeing GUI frameworks built on top of webviews in spite
               | of all the complains about the formats that webviews
               | support, and in spite of the myriad low-level rendering
               | frameworks already available.
               | 
               | To understand the poiny, try to think through the
               | requirements list to implement a renderer for Markdown.
               | It's a document format with a half dozen of features.
               | Would you call it trivial?
        
               | mapreduce wrote:
               | > You're confusing some things. A document format is one
               | thing, but a renderer with specific capabilities is an
               | entirely different thing. You're commenting on the
               | document format and the styling and layout system, and
               | now you're shifting the conversation to what it takes to
               | implement a renderer.
               | 
               | If you follow the comment you replied to the discussion
               | was about implementing a renderer. So no, I am not
               | shifting the conversation to implement a renderer. The
               | conversation _is_ about implementing a renderer. That it
               | is incredibly difficult to do today with the modern specs
               | is the point.
        
               | math_dandy wrote:
               | What about option D. Use a WebView. This is exactly what
               | the author did. The point of the proposal is identify
               | which features of a WebView can be used (and which must
               | not) if the goal is to produce nice text layouts in
               | multiple form factors. But the rendering of HTML and CSS,
               | and the execution of Javascript are solved problem.
        
             | wharvle wrote:
             | > Styling is not handled by HTML. It's a separate concern
             | assigned to CSS. For convenience HTML offered default
             | styling.
             | 
             | It in-fact was, and to some degree still is. I assure you
             | we achieved a hell of a lot of styling before css existed,
             | and for some time after it did but before most of us were
             | using it (much), using features of HTML, some of which were
             | _explicitly_ there to support styling.
        
         | eviks wrote:
         | There is an singlefile extension that can save a page in a
         | single self-extracting zipped html where you don't need to
         | waste base64 anything, and can unzip to a folder and view
         | images as is without the page
        
         | foofie wrote:
         | > I think another discouraging aspect is HTML CSS are so huge
         | and bloated at this point that few people can implement a
         | "reader" for EPUB/HTML. It's basically "go implement a new
         | browser".
         | 
         | I don't think that's true. In the very least, you can use a
         | WebView and feed it regular HTML. If the whole industry uses
         | webviews for GUIs, it's hardly a stretch to use one to render
         | Epub docs.
        
         | baq wrote:
         | > It feels like a bit of doomed a project simply bc browsers
         | don't open EPubs.
         | 
         | I guess that's why the article is actually an epub opened with
         | a WASM epub viewer :)
        
         | jasomill wrote:
         | _It feels like a bit of doomed a project simply bc browsers don
         | 't open EPubs._
         | 
         | While browsers don't provide a convenient UI for opening EPUBs,
         | they should have no problem rendering the chapter HTML files
         | contained within.
         | 
         | In the absence of browser support, writing a server-side EPUB-
         | to-browsable site proxy that adds chapter navigation controls
         | and simple layout options shouldn't be too difficult.
         | 
         | Incorporating the necessary DRM support required to view the
         | majority of commercial ebooks through such a proxy would very
         | likely be legally problematic, of course.
         | 
         | Come to think of it, any form of publisher-approved DRM EPUB
         | browser support sounds like it'd be about half a technical step
         | away from DRM support for web pages in general, which is a
         | horrifying prospect.
        
         | Shorel wrote:
         | > It feels like a bit of doomed a project simply bc browsers
         | don't open EPubs.
         | 
         | If you read the article, then you just did open an EPUB.
        
       | zvmaz wrote:
       | The author is a post-doc advised by Shriram Krishnamurthi [1],
       | the author of Programming Languages: Application and
       | Interpretation (PLAI), and one of the authors of Data-Centric
       | Introduction to Computing (DCIC). I am currently reading both
       | PLAI and DCIC and I am truly delighted by the minute care the
       | authors have put into making the books pedagogical works of art.
       | That's true love!
       | 
       | [1] https://willcrichton.net/
        
         | verisimi wrote:
         | > the minute care the authors have put into making the books
         | pedagogical works of art. That's true love!
         | 
         | Works of art. True love! That's very high praise.
        
           | zvmaz wrote:
           | I mean it. Both books are free, and PLAI has an interactive
           | tutorial on the language used in the book called SMoL [1]
           | done by one of Shriram's students. The tutorial is _not_ a
           | passive one by any means; it forces you to think and
           | highlights pitfalls students often fall into when reading the
           | material.
           | 
           | This whole ethos on readers learning is in stark contrast to
           | books that feel like the authors show off how smart and
           | clever and profound they are instead of caring about their
           | readers comprehension. I include very highly praised books
           | even on HN.
           | 
           | [1] https://www.plai.org/#direct-links-to-the-tutor
           | 
           | N.B. The author of this post, portable EPUBs, also works on
           | language learning. A whole ethos...
           | https://arxiv.org/abs/2401.01257
        
         | sotix wrote:
         | And the author has worked on making Rust easier to learn[0]!
         | 
         | [0] https://rust-book.cs.brown.edu/
        
       | morelisp wrote:
       | The bar for epubs is so fucking low I have a hard time believing
       | this matters at all. Just last week I bought a book set in the
       | late Middle Ages which managed to transcribe all "th" as "p".
       | Until publishers care about that stuff, none of these high-
       | falutin technical discussions change anything.
        
         | offices wrote:
         | I don't see how this relates to the link.
        
           | morelisp wrote:
           | The file format doesn't matter one bit when the reading and
           | authoring tools are shit and the editors can't/don't fix
           | anything. And papers will generally have a lot fewer
           | resources to deal with this than major book publishers, who
           | have been epub-focused for over a decade now and actually
           | make money from it.
        
             | harshreality wrote:
             | Any decent publishing or html editing tools fully support
             | utf-8 by now. It's not the tools.
             | 
             | Publisher and editor laziness may be a reason to be
             | cautious about epubs _currently_ for niche or esoteric
             | works, but that 's not the same thing.
             | 
             | > I bought a book set in the late Middle Ages which managed
             | to transcribe all "th" as "p". Until publishers care...
             | 
             | The book market these days makes it challenging to do high-
             | quality editing up front for republishing niche books in a
             | new format. Publishers try to cut corners, outsourcing epub
             | conversions to people who don't care and don't know what
             | they're doing, or they OCR it, have an in-house editor (who
             | also doesn't have a personal affinity to the subject) give
             | it a once-over (maybe), and release it.
        
               | BlueTemplar wrote:
               | As an aside : Unicode support was still an issue in TeX
               | last I checked, because most of the LaTeX tools don't
               | support it (well, having been made before it was
               | expected).
               | 
               | Now, there are some attempts to fix this situation by
               | Xe(La)TeX and Lua(La)TeX, but since TeX seems to be so
               | much tied to PDF these days, it should probably just be
               | abandoned by most scientific publishing in favor of the
               | likes of GNU TeXmacs (note : it's NOT TeX in GNU Emacs)
               | and HTML with MathML.
        
       | sirsinsalot wrote:
       | I agree with most of the content of this post. One of the key
       | things for me would be a requirement that figures and diagrams
       | which can be expressed as SVG should be.
       | 
       | Images, limited to things that need grid-of-pixels representation
       | like photographs, should be limited to that.
        
       | livrem wrote:
       | I use the SinglePage add-on for Firefox (think it is available
       | for Chrome as well?) to save the current page DOM to HTML as a
       | self-contained file (inlined CSS and data:-URLs for all images)
       | with no dependencies and all the scripts removed etc. It is not
       | perfect, and I do not trust browsers to always remain backwards
       | compatible, but I prefer it to save pages as PDF or as multiple
       | files.
       | 
       | Interestingly one of the few pages I ever saw it fail on was this
       | article on portable EPUBs. Guess it has too much magic going on
       | to make the formatting work. The saved page is perfectly
       | readable, but the style is nothing at all like what the original
       | page was for some reason.
       | 
       | I like how fbreader on Android just displays all books exactly
       | the same, and as configured in the app rather than using any of
       | the styling from the EPUB file. I never noticed that it tried to
       | apply CSS or run scripts included in files and I hope it never
       | tries to do either of those things. Loading external dependencies
       | sounds like an even worse idea and I did not think that was even
       | allowed.
        
         | actionfromafar wrote:
         | SingleFile, right?
         | 
         | Edit:
         | 
         | On that note, what's up with the Firefox Add-Ons?
         | 
         | Currently, they are all setup so that to do something
         | interesting, they need _all the permissions_.
         | 
         | Which leads to a natural market being created, of bad actors go
         | shopping for Add-Ons they can take over.
         | 
         | Can something be done about this? For instance for this
         | "SingleFile" addon. It needs to access the rendered document in
         | the DOM to be able to introspect and save it all to a file.
         | 
         | But why does it need access to _everything_? Can 't it have
         | just permissions:
         | 
         | - "snapshot DOM once"
         | 
         | - "write to a single file"
        
           | pbronez wrote:
           | Agree. There are many extensions that I want to use on
           | arbitrary websites, but rarely. This could be handled by the
           | browser locking the extension out from everything, until the
           | moment I manually invoke it, at which point it's allowed
           | access to the page I'm currently looking at.
           | 
           | Now, maybe that's how it already works, but I have no
           | confidence in it.
        
           | gildas wrote:
           | Author here, I wished SingleFile would use less permissions,
           | but it's unforntunately not possible from a technical point
           | of view. Anyway, if you run some code I've written but are
           | suspicious, you have to trust me or review the code which is
           | open-source.
        
             | darkteflon wrote:
             | SingleFile is amazing - one of my most-used extensions on
             | both desktop and mobile by far. It's elegant, unobtrusive
             | and it just works. Thanks for making!
        
               | gildas wrote:
               | Thank you for trusting me and for your kind words ;)
        
             | vertis wrote:
             | I use SingleFile ALL the time. Thank you so much for it.
             | 
             | I Perma-web anything that I find interesting, after
             | discovering by going back in my notes that half the
             | bookmarks I'd added no longer existed 5 years later.
             | 
             | I don't think I could do half as good a job if it wasn't
             | for your extension.
             | 
             | I owe you a coffee/beer -- actually i just found your
             | donation page, but still a drink IRL if we ever run into
             | each other at a conference/etc.
        
             | actionfromafar wrote:
             | I have no qualms about you or SingleFile. I used it today,
             | it's great!
             | 
             | I just think market pressure created by the permissions
             | systems is unfortunate, in aggregate. With x thousands of
             | add-ons, bad stuff has happened, and is going to happen.
             | Any improvement to the permissions which could mitigate
             | that at least somewhat, would be nice.
        
           | livrem wrote:
           | SingleFile, yes. Thanks. Did not see it within the edit-
           | window.
           | 
           | I agree about permissions. In this case it looks like it
           | needs a bit more, since it has some options like enabling
           | auto-save after page-load for tabs for instance. Not a
           | feature I have used, but I am sure it can be useful for semi-
           | manually scraping sites.
        
         | dotancohen wrote:
         | Another comment explains why the page is difficult to parse:
         | > It's a very well thought through article by the developer of
         | Nota, trying to bring EPUB format up to parity with PDF. It's a
         | serious start and they've already written a viewer. In fact,
         | the article itself is displayed in a browser-based wasm port of
         | the viewer (and looks good!).
        
         | JadeNB wrote:
         | > I use the SinglePage add-on for Firefox (think it is
         | available for Chrome as well?) to save the current page DOM to
         | HTML as a self-contained file (inlined CSS and data:-URLs for
         | all images) with no dependencies and all the scripts removed
         | etc. It is not perfect, and I do not trust browsers to always
         | remain backwards compatible, but I prefer it to save pages as
         | PDF or as multiple files.
         | 
         | Ha, I think that one of my first HN comments, 10 years or so
         | ago, was how I wanted to be able to save HTML web pages as
         | HTML, not as PDF. I'm sure I didn't explain (or understand) my
         | reasoning well, but it was roundly regarded as a ludicrous
         | thing to want to do. I'm glad to hear that I was just a decade
         | out of sync.
        
           | gildas wrote:
           | Actually, the first release of SingleFile is 13+ years old
           | but it was less popular at the time because Chrome (it didn't
           | yet support MHTML) had a negligible market share. People
           | generally saved their pages in MAFF or MHTML format in
           | Firefox or IE. It was when Firefox abandoned XUL extensions
           | that SingleFile was able to rise from the ashes, because
           | there was once again a real interest in it.
        
         | dsr_ wrote:
         | FBReader uses CSS from the document by default; you can turn it
         | off in, IIRC, four stages.
         | 
         | KOreader gives you more control but in a less friendly manner:
         | in addition to choosing specific CSS from several supplied
         | files, you can write your own.
         | 
         | (Also wins for KOreader: excellent OPDS support, and easy self-
         | hosted sync server.)
        
         | Shorel wrote:
         | Yes, but this article has an EPUB download button.
         | 
         | This feature makes SinglePage unnecessary for any page using
         | this system.
        
         | Agraillo wrote:
         | Funny, my misspelling sometimes also. I suppose this has
         | something to do with probabilities inside our brain, the phrase
         | "Single page" might be more probable than "Single file" (Hmm, I
         | smell some similarities to LLM probabilities).
         | 
         | On a side note, what is interesting with SingleFile is that
         | since the file can contain anything including JS, it's possible
         | to create a local "executable" (html) that uses the resources
         | inside the file and does not use a single external file. I have
         | an actual game-like piece that runs locally with bunch of files
         | and comparatively easily transforms into a single self-running
         | "program" that even runs on a mobile file manager with WebView
         | support
        
       | znpy wrote:
       | ePub aren't that great either. PDF might not be perfect, but it's
       | likely the best we have and the best we'll have in a long time.
       | 
       | ePub renders wildly different on my eReader (kobo), my linux
       | laptop (various apps), my iPad and my iphone. And i'm not talking
       | about screen sizes, i'm talking about various elements being
       | rendered largely incorrectly (and there's a matrix of
       | incorrectness across implementations.
       | 
       | PDF documents on the other hands... they render just right,
       | everywhere. I have to zoom and scroll, but I will never have to
       | ask myself "will I be able to actually read this document?" when
       | dealing with PDF.
       | 
       | Oh and by the way... I still print stuff from time to time. Yeah
       | way less than it was needed in the past, but it's still a
       | necessity. Can you even print stuff (sections? pages?
       | selections?) from an ePub?
       | 
       | > Bene is designed to make opening and reading an EPUB feel fast
       | and non-committal. The app is much quicker to open on my Macbook
       | (<1sec) than other desktop apps.
       | 
       | This is elitist at best. Claiming something "is fast" on top-
       | class hardware is misleading at best (if not dishonest).
       | 
       | Try and running that on low class hardware (stuff like
       | chromebooks but also laptops from at least 7-8 years ago) and
       | let's see if it's still "fast".
       | 
       | I'm not convinced.
        
         | broscillator wrote:
         | > PDF documents on the other hands... they render just right,
         | everywhere. I have to zoom and scroll,
         | 
         | To me it feels the other way around, if I have to zoom and
         | scroll, they render just _almost_ right.
         | 
         | And that almost is extremely important for actual _reading_. If
         | you 're quickly skimming a PDF, sure. But to sit down and read
         | for 30 minutes? One hour? Fuck zooming and scrolling. My kindle
         | just displays a page, and I tap it and it goes to the next
         | page. Can't really get a much better reading experience than
         | that.
        
           | znpy wrote:
           | If I'm reading that long, I don't need zooming and scrolling
           | on my ipad.
           | 
           | Epub on ereaders work well but only if you're reading
           | fiction. Most images and almost all tables and charts have
           | been messed up by epub rendering anyway. And ereaders are
           | black and white, so you're losing information anyways.
        
             | broscillator wrote:
             | right, this goes against what you said about rendering well
             | everywhere, given how you mentioned your ipad. In any other
             | device you will likely need to zoom and scroll constantly.
             | 
             | Tables and charts are also a specific use case. There's no
             | mention of them whatsoever on the website so one can assume
             | this is talking about mainly text.
             | 
             | In other words, you described the only time when PDF are
             | more comfortable: if you have an iPad and you need to read
             | charts/images/tables. Far from the claim of "they render
             | just right everywhere".
        
         | arp242 wrote:
         | I have a six year old laptop and regularly read ePubs on it
         | (also my PocketBook e-reader) - it's fast enough. The initial
         | page calculations can take a few seconds, but this can (and is)
         | cached and done in the background. It's not that bad and "time
         | to something useful on screen" is more than acceptable. Large
         | PDFs also aren't exactly fast by the way.
         | 
         | Most e-readers are "low class hardware". ePubs work fine on
         | most of them.
         | 
         | In terms of performance there isn't a clear winner here; both
         | can be somewhat slow at times for large documents, but are also
         | "fast enough" for the common case, even on older low-spec
         | hardware.
         | 
         | I do think the general software ecosystem surrounding ePubs is
         | not quite there yet, but that's mostly a matter of UX and
         | "software that hadn't been written yet". As a format ePub is
         | the clear winner for many (not all) scenarios. I struggle
         | reading many PDFs because "zoom and scroll" that you mention is
         | a right pain if you have to constantly do it (which you often
         | do if you zoom text). Comfortably reader PDFs on my phone or
         | e-reader is basically impossible.
        
       | crabmusket wrote:
       | > There's two ways to make progress [on document aesthetics]
       | here. One is for browsers to provide more typography tools. ...
       | The other way is to pre-calculate line breaks, which would only
       | work for fixed-layout renditions.
       | 
       | The third way is to develop non-browser clients, like the
       | author's own Bene. While it currently uses Tauri and therefore
       | the system webview, there's no reason it should always do that,
       | or that another client couldn't be developed with a focus on
       | typography.
       | 
       | Want your documents to look super nice with all the kerning and
       | line breaking your heart desires? Get a proper reader app.
       | 
       | Just want the content? Sure, open it in a browser or a basic
       | reader.
        
       | BonoboIO wrote:
       | Wow, I m blown away how fast this site is. It loads instant on my
       | iPhone. Perfect text sizes ... really nice.
        
         | crabmusket wrote:
         | You didn't get a huge multi-second loading spinner?
        
           | BonoboIO wrote:
           | Nope. Normally i'm used to surfing the web even with
           | adblockers to get slow loading times from bloated websites
           | with megabytes of useless libs. This was refreshingly quick.
        
       | upofadown wrote:
       | >You might dislike the idea that document authors can run
       | arbitrary Javascript on your personal computer.
       | 
       | How I feel about this depends a lot on how much I trust the
       | people who created the document and/or the person who sent it to
       | me. I would trust a website I specifically selected with a
       | defined TLS web of trust more than I would trust a random spam
       | email.
       | 
       | When we think about the risks associated with the complex and
       | inherently insecure format known as HTML we tend to assume the
       | level of trust available on the web. If we package up a bunch of
       | HTML in a standalone document then we lose that assumption.
        
       | BlueTemplar wrote:
       | See also : "The decades long quagmire of encapsulated HTML"
       | (2022) :
       | 
       | https://www.russellbeattie.com/notes/posts/the-decades-long-...
       | 
       | (Which still hasn't been posted as a "news" it seems... should I
       | just submit it myself ??)
        
       | harshreality wrote:
       | I think precisely dictating layout is the wrong objective,
       | although some areas (legal documents, academic papers) are still
       | obsessed with that. Arxiv recently started offering a subset of
       | papers in html, which is a short step from epub.
       | 
       | If quick high-resolution referencing (page x, yth paragraph, zth
       | line) is necessary, I think the way to handle it is to reference
       | a phrase on that line ("the cat ran"...), which the reader can
       | search for. If the search interface is lacking, that's an epub
       | reader failure, not a format failure. Or, if the search option is
       | considered insufficient (it does require typing with a [virtual]
       | keyboard), paragraphs can be numbered--as many ancient works or
       | works in translation already are, because such works have many
       | editions and can't be layout-exact copies of each other.
       | 
       | If paragraph referencing is necessary, visibly styling the
       | paragraphs with numbers helps dramatically. There's no reason it
       | has to be exclusive to high-profile ancient or classical works,
       | like Plato [1].
       | 
       | Classic poetry and plays, where referencing needs to be most
       | exact and fast, already tend to give up any hope of everyone
       | using the same edition, and simply avoid flowed text and then
       | number the lines.
       | 
       | [1] https://en.wikipedia.org/wiki/Stephanus_pagination
        
         | foofie wrote:
         | > Classic poetry and plays, where referencing needs to be most
         | exact and fast (...)
         | 
         | I believe that paragraph referencing is, by far and without any
         | contest, used primarily by any text subject to reviews and
         | revisions. This means technical reports and academic papers.
         | 
         | All academic papers I subjected to review were forced to use
         | templates that enforced paragraph numbering. Even though each
         | version of those documents were only read by a dozen readers or
         | so, all papers submitted to those journals had to use the
         | template. This means hundreds of documents (see half a dozen
         | revisions per paper submitted per each edition) had that hard
         | requirement, and this took place for each edition of a single
         | journal.
        
       | xoac wrote:
       | I love this unreservedly. It's almost an embarrassment that
       | something like this does not exist already. Big thank you to Will
       | Crichton for putting all of this together and actually giving
       | this _idea_ a chance to take hold.
        
         | foofie wrote:
         | > It's almost an embarrassment that something like this does
         | not exist already.
         | 
         | To be fair, Epub has been largely ignored and neglected by
         | everyone in the world. Virtually no reader supports rendering
         | math notation, and virtually no significant publisher on earth,
         | including the likes of Arxiv, offers Epub downloads. Commercial
         | publishers force DRM onto every format, which excludes Epub,
         | and non-commercial either stick with PDF or don't care.
        
           | julielit wrote:
           | Sorry, but this is not true. Every significant publisher
           | produces EPUB these days. Most reading apps support EPUB
           | (including apps from Apple and Google), Readium and EDRLab
           | offer open-source SDKs that ease the development of mobile,
           | desktop and Web reading software with strong EPUB 3 support,
           | including MathML. Readium LCP is a DRM for EPUB, especially
           | for public libraries that need an e-lending end date. More,
           | EPUB is much more accessible than PDF for blind people and
           | other people with disabilities. PDF has no interest for
           | ebooks (but for short documents, yes).
        
       | gmuslera wrote:
       | Another standard? (https://xkcd.com/927/). It is not like the
       | publishing world will switch to it overnight, a lot is tied to
       | the devices they sell, so it might be not enough motivation. It
       | is not simpler to convert epubs into epubs with all remote
       | content embedded and relinked?
       | 
       | Another single file book format that used to be popular many
       | years ago was chm, that had its own security problems. Maybe
       | adding the possibility of executing (js) code is not the best for
       | something that should be mostly static, and css be used to enable
       | some level of safe interactivity.
        
       | asimpletune wrote:
       | This is something I deeply care about, as I'm also very
       | interested in the intersection of ebooks, security, and a LowJS
       | web.
       | 
       | 1. absolutely we should have a single-file, portable ebook
       | format, and since PDF doesn't reflow text then it's not that.
       | 
       | 2. HTML + CSS in 2024 is capable of reproducing virtually any
       | kind of printed medium, but it can also reflow text.
       | 
       | 3. I don't personally think JS should be a requirement, but a lot
       | of conversations break down at this point so let do my best to
       | explain and please understand I'm not simply being religious
       | about this. In my view, an ebook is a book that can change its
       | size in a way that makes sense. Bearing this in mind, I believe
       | that if a reader has JS turned off, this book should work as
       | intended. In other words JS shouldn't be required for it to
       | perform its basic function. However, if there is some necessity
       | to add interactivity or to augment the book, then yeah, why not,
       | use JS. That's what it's there for. However, from the perspective
       | of being a book, if it doesn't work as a book without JS then
       | it's a bug in my view. (for standard ereader capabilities, I
       | think the browser should offer that, but the book itself
       | shouldn't be shipped with an ereader)
       | 
       | 4. I think it's a mistake to embed all the styling as that may
       | violate some people's CSP's. It's safer to specify separate
       | styling as resources relative to the HTML, and ebooks should
       | forbid loading resources from a separate domain. In this way,
       | ebooks would always work offline, but it has the added benefit of
       | working online too and would automatically adhere to the
       | strictest possible CSP. This achieves the same goal as offline,
       | but in a safer and more universally compliant way.
       | 
       | 5. Finally, just distribute it in a zip file. That's how ebooks
       | already work right?
        
         | gildas wrote:
         | > 5. Finally, just distribute it in a zip file. That's how
         | ebooks already work right?
         | 
         | What about self-extracting ZIP files like this page?
         | https://gildas-lormeau.github.io/ (note that it includes the
         | CSP to make it safe)
        
           | asimpletune wrote:
           | I don't think there's any CSP?
           | https://observatory.mozilla.org/analyze/gildas-
           | lormeau.githu...
           | 
           | In any case, ideally, any ebook solution should have all
           | resources loaded as files relative to the current document,
           | and nothing inline. Like this, the book would be compatible
           | both offline and to be hosted online, without requiring any
           | changes to the book.
           | 
           | The one file thing is cool, but again it requires JS to show
           | anything, so that's not really inline with what I was talking
           | about. In my view, an HTML document should work like a book,
           | and any JS is purely to augment and extend that book if
           | necessary. Those situations are rare though, and most JS is
           | just to give a reader like experience.
           | 
           | I think if someone is going to go through the trouble to host
           | a book, they probably don't mind unziping a file before
           | putting it on their server. The one file thing I said earlier
           | was more about sharing the book, similar to an app package.
           | but once it's on someone's servers presumably it's ready to
           | be read.
           | 
           | Basically .webarchive
        
             | gildas wrote:
             | The index page is protected via CSP. The "bootstrap" page
             | which unzips the page isn't but it could. I just did not
             | include it for some reasons I've forgotten.
             | 
             | The JS is not essential, there's nothing to stop you
             | treating the file as a ZIP file and unzipping it beforehand
             | to view it.
        
               | asimpletune wrote:
               | Ok, well maybe the example provided doesn't communicate
               | the intention, because when I try to 'extract' the page,
               | following the download links, they don't work, except the
               | png. Cool concept though.
        
               | gildas wrote:
               | Unfortunately, the file poses a problem for basic
               | unzipping software (e.g. a file explorer). However, the
               | file is 100% valid with respect to the ZIP specification.
               | You just need to use "true" unzipping software like 7zip,
               | unzip etc. to read it as such.
        
           | emmanueloga_ wrote:
           | Super interesting! Haven't seen this before. Very creative
           | (abuse?) of zip files! :-)
           | 
           | https://github.com/gildas-
           | lormeau/SingleFile/blob/master/faq...
        
         | Turing_Machine wrote:
         | > That's how ebooks already work right?
         | 
         | Sort of, in the sense that any unzip tool will unpack an EPUB
         | file (possibly after renaming it to a .zip extension rather
         | than .epub).
         | 
         | However, it doesn't necessarily work the other way. You can't
         | just zip the files at random and wind up with a valid EPUB,
         | even if that exact same set of files was a valid EPUB before it
         | was unzipped.
         | 
         | The "mimetype" file in an EPUB zip is special. It has to be the
         | very first entry in the archive. It also has to contain the
         | string "application/epub+zip" (and nothing else) and must be
         | stored _un_ compressed.
         | 
         | A surprising number of zip tools make it hard to do this (e.g.,
         | by altering the file order as more files get added to the zip,
         | or making it hard or impossible to store one file uncompressed
         | while compressing the others). Most of the command line zip
         | programs can do this with the proper command line flags, but
         | zip libraries often make it a PITA.
         | 
         | Source: have written EPUB generation software.
        
         | charlieyu1 wrote:
         | Not a fan of zipping everything.
         | 
         | One thing I missed about working with old .doc files instead of
         | .docx, was that it was very fast to search a folder with
         | hundreds or thousands of files for a specific word. Not
         | possible for zip formats. (I just saved a file in .doc right
         | now and open it with a hex editor and it does contain the
         | contents in plain-text.) It is a problem to search zip formats
         | or pdf files with grep, and I really doubt that the zipping is
         | necessary for text with maybe a few thousand characters.
        
           | Arelius wrote:
           | To be fair. Iirc, an epub is stored uncompressed. So you
           | should be able to get a directory of them no problem.
        
           | kimixa wrote:
           | Zip files don't _need_ to be compressed, if there 's a need
           | maybe you can promote adding the option upstream to whatever
           | apps you use (if it's not already there somewhere). Or a .zip
           | equivalent for zgrep.
           | 
           | Though pattern matching in files feels very fragile - simple
           | text patterns would still rely on the text being kept in a
           | contiguous chunk and no embedded data/markup within the
           | section you're searching for...
           | 
           | Though I generally see the prevalence of zip files a result
           | of people assuming that collections of files/streams need to
           | be packed into a single object for "user simplicity", but we
           | already have this with OS support in the form of directories.
           | It just seems people have embedded the assumption that tools
           | and UI handle directories as single units poorly, when that
           | doesn't really need to be the case.
        
             | lupusreal wrote:
             | Browsers don't really have good support for downloading a
             | whole directory.
        
             | genevra wrote:
             | Have you ever tried to download 100 individual files or
             | moved 10,000 files to a drive VS a singular zip? The
             | difference is night and day
        
           | lupusreal wrote:
           | Just use zipgrep to search inside zip files? Or just unzip it
           | then use regular grep. I don't think it makes sense to gimp a
           | file format just to make life slightly easier for people
           | using archaic Unix utilities.
        
           | WorldMaker wrote:
           | ZIP is a real simple format under the hood.
           | 
           | There are zip-aware grep tools that you could investigate if
           | they fit your workflows.
           | 
           | There's also nothing stopping you from storing the contents
           | of zips as unzipped directories and rezip when/as necessary.
           | (I wrote a tool called musdex years ago to automate that flow
           | in the context of source control: source control the contents
           | of something like .docx expanded into a full directory
           | structure, but preserve the ability to "double click the Word
           | file to edit".)
           | 
           | As a wrapper of "collections of deeply related files, some
           | which may be text and some may be binary", ZIP is one of the
           | better choices that we have (compare to TAR or MIME
           | envelopes, for instance).
        
           | chasil wrote:
           | JAR and APK files are really ZIP, as is EPUB.
           | 
           | True, ZIPs are hard to grep. However, gzip and progeny come
           | bundled with convenience scripts as zgrep, bzgrep, xzgrep,
           | etc.
           | 
           | Maybe use a grep-friendly format if that's important to you,
           | or otherwise a filesystem with transparent compression
           | (btrfs, ZFS, or [gasp] NTFS).
        
         | Shorel wrote:
         | About #3: I would say that usability without JS should be a
         | requirement. We don't need JS, especially in a low-power device
         | like an e-reader.
         | 
         | #5: Why change something that is already working well? The
         | extension is .epub, even if it is a zip file.
        
         | ianburrell wrote:
         | It is also important to distinguish between normal zip file and
         | ebooks. Let's use the extension ".epub".
         | 
         | You described ePub format, where HTML, CSS, images files go in
         | zip archive.
        
       | Finnucane wrote:
       | I see a couple of roadblocks for this. One is that is suggestion
       | of a restrictive subset of HTML for coding seems like a potential
       | accessibility problem, which is to say, you'd have to make your
       | documents less semantically rich. For instance, he seems to be
       | suggesting using. It's already hard enough to get epubs to work
       | right when reading systems lag behind what browsers can support,
       | saying 'let's have less' is not going to make things easier if
       | you have complex content. The problem is not that there is too
       | much html or css, the problem is that reading systems don't
       | support them properly.
       | 
       | Also, most dedicated reading systems (Kindle, Kobo, etc) don't
       | allow javascript, which means your components will not work. That
       | might of course change, but I wouldn't hold my breath for it.
        
         | velcrovan wrote:
         | > One is that is suggestion of a restrictive subset of HTML for
         | coding seems like a potential accessibility problem, which is
         | to say, you'd have to make your documents less semantically
         | rich.
         | 
         | "less semantically rich" than what? Web pages? Or less rich
         | than PDFs, which is what he's actually proposing to replace?
        
           | Finnucane wrote:
           | Than the epub standard as it exists. I guess I don't
           | understand what the advantage here is, really. If you want to
           | make an epub file that works universally, you can do that,
           | within the existing standard. If makers of reading systems
           | and software would actually support the full standard, which
           | they currrently don't. If you don't want to use pdfs, don't
           | use pdfs. pdfs get used for a lot of things they aren't
           | actually very good for.
        
             | velcrovan wrote:
             | > If you want to make an epub file that works universally,
             | you can do that, within the existing standard.
             | 
             | Yes, that's what the author is proposing we do.
        
               | Finnucane wrote:
               | So basically, nothing new.
        
       | ryukafalz wrote:
       | I did a double-take when I reached this part:
       | 
       | > Therefore I decided to build a lighter EPUB reading system,
       | Bene. You're using it right now. This document is an EPUB -- you
       | can download it by clicking the button in the top-right corner.
       | 
       | Because, reading this on a desktop browser, I didn't even notice
       | until it was pointed out. It's more obvious on mobile because the
       | header takes up more of the viewport, but it otherwise behaves
       | pretty much like a normal web page.
       | 
       | This is probably a good thing.
       | 
       | For what it's worth, I didn't see (or at least didn't notice) a
       | spinner when loading the doc for the first time like some other
       | people in the comments reported. I did notice it on my phone, but
       | it went by pretty quickly. I'm not sure if that's the WASM
       | program loading and if it only happens the first time you load
       | the page.
        
         | edflsafoiewq wrote:
         | The main thing the spinner waits on is the .epub file to be
         | downloaded. That file is 4.77 MB, which is appreciable for
         | anyone without a fast connection. Most of that weight is images
         | (99% after decompression). Unlike a normal webpage (or a PDF),
         | rendering doesn't appear to start until all the assets, the
         | whole ePub, has been downloaded.
         | 
         | This segues into a point of difference I thought the article
         | would mention, but didn't: performance.
         | 
         | A PDF can be optimized so the pages are substantially
         | independent of each other, which makes rendering pages
         | progressive, random-access, and highly parallelizable.
        
       | aabdulllah wrote:
       | Hi, I was wondering if someone could explain what software
       | solution I need to create and implement in my calculator for it
       | to have a fast response time when calculating. How does this
       | work, in other words what enables it to work so fast?
        
       | aabdulllah wrote:
       | p.s. sorry I am new to engineering.
        
       | SamBam wrote:
       | One thing I'm very interested in, as a grad student who has to
       | consume a huge number of PDFs, is whether there are good tools
       | for converting existing PDFs to portable EPUBs or HTML documents.
       | 
       | If I use, for instance, CloudConvert [1], I generally get a
       | document that gets flowing text roughly right, but still
       | interrupts the text with page numbers and book titles (that were
       | originally at the top of each page) and includes additional
       | bizarre line breaks, etc.
       | 
       | Every so often I wonder if this is an LLM problem ("please
       | reformat the following text to...") but I think that one
       | shouldn't reach for an LLM for these kinds of things.
       | 
       | 1. https://cloudconvert.com/pdf-to-epub
        
       | nmz wrote:
       | > * PDFs cannot easily express interaction. PDFs were primarily
       | designed as static documents that cannot react to user input
       | beyond filling in forms.
       | 
       | I am glad about this, I do not want to download a document and it
       | require any input. a document should be a document, nothing more.
       | If I'm getting a book to read (pdf), I expect a book, not a
       | webapp.
        
       | sotix wrote:
       | I really like this proposal. Just last night, I was reading an
       | EPUB of The Hobbit and clicked on a footnote, which instructed me
       | to refer to page 24. It turned out that it meant page 24 of the
       | printed edition of the book, which was in the first chapter of
       | the book. Page 24 in the EPUB was still part of the prologue. So
       | I had no idea which page it was referencing. As it stands, my
       | Kobo has an increased font size, so I notice that I can flip the
       | page a few times and still be on page 4 before it finally turns
       | to page 5, which I assume is referencing the pages of the written
       | text. This is a nice compromise, but doesn't solve the issue with
       | the hard coded footnote being misleading.
       | 
       | I wonder if we could instead look at religious texts such as the
       | Bible (e.g. John 3:16) and code editors (e.g. Ln 4, Col 12) for
       | referencing locations in reflowable text. The same way you can
       | jump to a footnote in a document should allow you to have an
       | actionable reference to a specific location anywhere in the text.
       | But I don't think the text should be stylized like how the Bible
       | has the numbers (e.g. 16) scattered within the text itself. Those
       | should probably be hidden within the text and leave the reading
       | software to display the first line number of the page down at the
       | bottom instead of the page number. That might look like "4" the
       | same as it currently does, but this 4 references a section of the
       | text rather than a page number. Perhaps it could be togglable for
       | greater detail and display word 23 of section 4 as "4:23". Or
       | maybe it could consider the chapter too. For example, chapter 2
       | section 4, word 23 would look like "2, 4:23". This might get
       | funky in a Terry Pratchett novel, but it would hopefully allow
       | for easier discussion of exact parts in a document and
       | significantly easier linking.
       | 
       | I _love_ the interactive code example for marking up The Rust
       | Programming Language. That gets at what I was saying above
       | although is more targeted at a document than referencing parts of
       | a novel.
       | 
       | Kudos to author for creating Bene[0] as part of this proposal.
       | That was cool to discover I was using their tool to read the
       | proposal itself!
       | 
       | [0] https://github.com/nota-lang/bene/
        
       | Shorel wrote:
       | On the one hand, these are just EPUBs, nothing in the article
       | makes the generated EPUBs different or incompatible from the ones
       | I usually read.
       | 
       | On the other hand: this looks awesome for the Web. Mix it with a
       | blog platform like Medium, and it will improve my Web browsing
       | experience tenfold.
        
       ___________________________________________________________________
       (page generated 2024-01-26 23:01 UTC)