[HN Gopher] Portable EPUBs
___________________________________________________________________
Portable EPUBs
Author : sohkamyung
Score : 530 points
Date : 2024-01-26 01:53 UTC (21 hours ago)
(HTM) web link (willcrichton.net)
(TXT) w3m dump (willcrichton.net)
| ijhuygft776 wrote:
| Portable epubs? All the epubs I ever downloaded are portable...
| not sure how it could NOT be the case... not reading an article
| with a title like this.
| zwayhowder wrote:
| I had the same thought, but wanted to know why the Author
| thought they were not.
|
| _For example, a major issue for self-containment is that EPUB
| content can embed external assets. A content document can
| legally include an image or font file whose src is a URL to a
| hosted server. This is not hypothetical, either; as of the time
| of writing, Google Doc 's EPUB exporter will emit CSS that will
| @include external Google Fonts files. The problem is that such
| an EPUB will not render correctly without an internet
| connection, nor will it render correctly if Google changes the
| URLs of its font files._
|
| The article raises some interesting ideas. Much like PDF and
| PDF/A, I would say an EPUB/A standard would be potentially
| useful.
| adamzochowski wrote:
| But same font problem exists with PDFs. If font is not
| embedded into PDF, or rendered into a vector shape that
| embedded, then PDF will display garbage.
| BHSPitMonkey wrote:
| Isn't that solved in PDF/A, which the GP was implying could
| also be done for EPUB?
| jasomill wrote:
| Yes: among other things, PDF/A requires all fonts to be
| embedded.
| BarbaryCoast wrote:
| Thanks for that. I can't read the article, probably because I
| block WASM (and Javascript) for security. None of my ebook
| readers have Internet access (for security and for privacy),
| so none of those internet-only epub files would work for me.
|
| This might be "legal", since XHTML was intended for the web,
| but I assume Google's using it to collect more user
| interaction data that they can sell to data brokers.
|
| FWIW, PDF is simply Postscript that's been compressed. As far
| as I can tell, almost all documents these days are created
| with Microsoft Word, TeX, or Postscript. I'm lumping things
| like PageMaker and LaTeX in with the base they were derived
| from.
| Symbiote wrote:
| The article is a WASM EPUB viewer. There's a link shown in
| the viewer to the EPUB file:
|
| https://willcrichton.net/notes/portable-
| epubs/epubs/portable...
| xtracto wrote:
| Well... if you actually read the article instead of just the
| header, you would learn the reason for the need of a portable
| version.
| emmanueloga_ wrote:
| "portable EPUB: an EPUB with additional requirements and
| recommendations to improve PDF-like portability."
|
| IMO epub is fine for fiction but not for any sort of
| technical material. EPUB docs are slow to reflow and the
| layout is pretty much always broken in some way, specially
| when there are tables and graphics involved. PDFs are a lot
| faster to render and navigate, the fixed page size being one
| of the reasons.
| wolverine876 wrote:
| The OP addresses some of those issues.
| lxgr wrote:
| Having suffered through many PDFs on my phone I'd take slow
| reflow and sometimes broken layouts over no reflow and
| guaranteed horizontal scrolling any time.
| auggierose wrote:
| Not sure why anyone would want to read PDFs on their
| phone.
| offices wrote:
| Imagination error. My phone has hundreds of downloaded
| PDFs from emails containing things such as tickets, job
| specs, pseudo-letters, bills, etc.
|
| Anything where one might wish to read a Document in a
| Portable Format.
| auggierose wrote:
| Yeah, I guess. The thing with these is I don't care about
| the "quality" of a bill or ticket, it's enough if the tax
| man / concert venue accept it.
|
| Many people advocating for a "better" PDF don't
| understand the quality aspect of a PDF. I am not willing
| to compromise on that when reading a book. It beats all
| other aspects, including the fact that I cannot read it
| on my phone. Basically, PDF's are a perfect translation
| of books into the digital medium. Gimmicks and features
| on top of what PDF can do are fine, but _never a
| replacement_ , given that books also don't have these
| features.
| lxgr wrote:
| I quite often find interesting research papers during the
| day that I don't have time to read in the office, and
| there's no stable cell signal on my commute, so it's in a
| way the perfect reading environment for these for me.
|
| My commute isn't long enough and often too crowded to
| warrant pulling out a tablet though. Reading on a single-
| hand device is ideal, and I prefer physical to physical
| books for that reason. So why shouldn't I read research
| papers the same way? I just want a portable document
| format for an actually portable device.
| SilentM68 wrote:
| One thing I dislike about PDFs is that dark themes usually
| don't render good, especially the embedded images, whereas the
| EPUB format seems to render them just fine. If a new EPUB
| format is created, I would suggest that they support
| pagination, since post secondary courses usually ask students
| to site chapters, pages, etc. Most EPUBs that I've come across
| don't have pages. The last thing I'd suggest is that the new
| standard, if created, should incorporated accessibility
| features, so that the file is readable by screen readers. PDFs
| are rarely designed with accessibility in mind. Making them
| accessible is also a gigantic pain to do. The technology behind
| any new EPUB document standard should have native accessibility
| support by default. People with print disabilities will thank
| you.
| starkparker wrote:
| > If a new EPUB format is created, I would suggest that they
| support pagination, since post secondary courses usually ask
| students to site chapters, pages, etc. Most EPUBs that I've
| come across don't have pages.
|
| From the post:
|
| > I think we just have to give up on citing content by pages.
| Instead, we should mandate a consistent numbering scheme for
| block elements within a document, and have people cite using
| that scheme.
|
| The point of a citation is to specifically reference an
| assertion. Any method of specifically referencing an
| assertion works.
|
| If anything, referencing by section and paragraph is more
| portable than referencing by page number. It's more
| consistent across different print formats of the same text
| (hardcover vs. paperback, and mainstream print editions vs.
| large-text or braille editions) as well as different digital
| formats.
| jxdxbx wrote:
| this is an issue in the legal world. court opinions
| accessed only online are cited according to their "page
| number" in some reporter or another. it's better to cite
| paragraph numbers when possible but most American legal
| documents are un-numbered.
| crabmusket wrote:
| It sounds like the legal world can continue to use PDFs.
| That's fine!
| dsr_ wrote:
| Laws already have problems with HTML: numbered lists are
| specified in a way which is incompatible with many
| jurisdiction's numbering schemes, including the US
| Federal standard.
| steve1977 wrote:
| > dark themes usually don't render good
|
| PDFs should not render dark themes at all. PDFs should like
| exactly like they were produced. So if they were produced
| with black on white text, that's what they should render, in
| any circumstance.
| Vecr wrote:
| Zathura[0] has a dark mode[1], it works pretty well.
|
| [0]: https://pwmt.org/projects/zathura/
|
| [1]: ^r Recolor (grayscale and invert colors)
| o11c wrote:
| Or it can just invert the L component of all colors in the
| HSL colorspace at the very last stage of rendering, which
| only requires a couple subtractions to do in sRGB.
|
| Unlike the unfortunately common "invert _directly_ in sRGB
| ", this preserves the colors changing only the brightness,
| and honestly it's pretty good. Colorspace nerds will no
| doubt complain that there are better colorspaces available,
| but in practice, most consumer devices implement "sRGB"
| perceptually such that this works better than fancier
| methods (which only work for carefully calibrated displays
| in carefully calibrated rooms).
| steve1977 wrote:
| I didn't say PDF cannot do it, I said it should not. It
| defeats the purpose of PDF.
| wolverine876 wrote:
| It's a very well thought through article by the developer of
| Nota, trying to bring EPUB format up to parity with PDF. It's a
| serious start and they've already written a viewer. In fact, the
| article itself is displayed in a browser-based wasm port of the
| viewer (and looks good!).
|
| One issue is how precisely EPUB, which is really XHTML, can
| reproduce layout. What are the possibilities here? The OP's
| standard is that the document will look "reasonable". The imply
| that HTML would need new layout capabilities to match PDF, at
| least for line breaking:
|
| _There 's two ways to make progress here. One is for browsers to
| provide more typography tools. Allegedly, text-wrap: pretty is
| supposed to help, but in my brief testing it doesn't seem to
| improve line-break quality. The other way is to pre-calculate
| line breaks, which would only work for fixed-layout renditions._
|
| Also, though the author mentions annotations, I don't see how
| they intend to implement them.
| joshjob42 wrote:
| The author discusses fixed layout epubs. Effectively, the epub
| can give a default pagination, line-breaks, font, font size,
| page size, and positioning for images etc., making it render
| identically on everything (one might optionally omit pagination
| if opening in a browser but keep everything else). This can be
| done already in epub3. But that's not ideal, because then it
| doesn't look good anymore on a phone, etc. Depending on the
| reader though, you could override the default, but then you
| have to hope that your reader does a good job of making a nice
| document. An alternative is for the epub to specify multiple
| renderings, for various common screen types.
|
| I don't think this is unreasonable as a solution. By all means
| let's try to get a smart reader, but letting people create
| defaults for their documents that can be overridden if desired
| by the user is a good middle ground.
| idoubtit wrote:
| Indeed, EPUB3 provides all the features that the author
| wishes. His "portable EPUB" format is just a loosely
| specified subset. It's unclear if some extra features are
| included in the format as they are in his "Bene" tool, like
| the rendering of references (i.e. links with a data-target
| attribute).
|
| The EPUB3 standard is much more complex than EPUB2 (media
| overlays, mixing fixed layout with reflowed, MathML...). In
| my experience the implementations are much more varying, and
| most of them aren't complete. So a "Portable EPUB" may not
| render as expected because the reader tool lacks some
| specific feature. The author also requires full JS support,
| which I supose does not help with portability.
| zozbot234 wrote:
| Doesn't CSS support layout capabilities for paged media out of
| the box? An EPUB reader just has to implement good old-
| fashioned "Print Preview" display mode, and you're set.
| zdunn wrote:
| > Also, though the author mentions annotations, I don't see how
| they intend to implement them.
|
| It's discussed at the very end of section 8 and all of section
| 9 that interactive functionality would use web components.
| nine_k wrote:
| PDF does not have any capabilities of line breaking. It is a
| _picture_ format, similar to SVG, only more rigid. That 's why
| it can't have text reflow, etc.
|
| What an ebook format needs is a _semantic_ form of markup,
| which adapts to devices it is rendered on. HTML + CSS were
| invented for this goal.
|
| With that, book layout authors should consciously relinquish
| some control on how the book looks, and hand it to the reader.
| Slight visual imperfections are a small price to pay for this.
| Who needs visual perfection should go for a PDF.
|
| This, of course, becomes hard if any interactive stuff is
| involved. I would suggest that larger interactive elements
| should open in a dedicated view when needed, and tiny
| interactive elements should embrace reflow.
| kps wrote:
| > HTML + CSS were invented for this goal.
|
| HTML (with SVG and MathML) is probably fine for most books,
| but CSS has spent 30 years resolutely resisting basic
| typography, i.e. default text baseline alignment.
| wolverine876 wrote:
| You will enjoy the article, which goes into these issues in
| some detail.
| BlueTemplar wrote:
| Ironically, the very example the author uses for annotations
| doesn't work properly for me : on touchscreen Android Firefox I
| get a link instead of a popup when press-holding.
|
| And aren't annotations (and references) already part of the
| EPUB specification, and probably even the HTML specification
| ?!?
|
| Finally, I disagree with the press-and-hold for popup being
| better than the usual practice of hyperlink anchors, IMHO their
| jumping around is much less disruptive. (As long as the
| reader's "return" function is working properly, and/or - for
| the bijective ones - they provide a "back" hyperlink.)
| wolverine876 wrote:
| > And aren't annotations (and references) already part of the
| EPUB specification
|
| I'm pretty sure they are not, based on looking carefully a
| year or two ago, on recent discussions here on HN, and on the
| OP's belief that they need to invent it.
| xnx wrote:
| 8 days ago, 134 points: "Portable Web Documents - An Alternative
| to PDF Based on HTML5 (2019)"
| https://news.ycombinator.com/item?id=39036774
| crabmusket wrote:
| And this current post is exactly what I was wishing for in my
| reply[1]. Really glad this was posted!
|
| [1] https://news.ycombinator.com/item?id=39037135
| mr_mitm wrote:
| I'm fully on board with the author's "I want to replace PDF"
| sentiment.
|
| It's true that running code in the document has some downsides,
| but the vast majority of people does it all the time in their
| browsers. And it comes with tremendous upsides. Just imagine
| large amount of data presented in interactive tables which can
| sort, filter and export or interactive graphs inside the
| document. We already use HTML+JS so much, why should we stop at
| documents? Yes, they can't be printed, but in my observation less
| and less people even own a printer these days, and I see no
| reason why this trend should not continue. I bet the future will
| be mostly living, interactive documents.
|
| It's funny that I just mentioned this in the other thread [1],
| but I also felt that there is a need for a format that is self-
| contained and widely supported by standard software (by which I
| mean browsers). A well-specified open format would be great, but
| until then I tackled the self-containedness problem with JS and
| wrote a Python script that zips and bundles all assets and embeds
| them as a SPA into one HTML file [2]. The focus is on Sphinx docs
| but it should work in general with all distributed HTML docs.
|
| [1] https://news.ycombinator.com/item?id=39138444
|
| [2] https://github.com/AdrianVollmer/Zundler
| fodkodrasz wrote:
| > It's true that running code in the document has some
| downsides, but the vast majority of people does it all the time
| in their browsers.
|
| This probably has to do something with them having nothing to
| do, as the big companies managed to convince the frontend dev
| community that the single best thing to generate layout is on
| the client machine on the fly. Of course they did it so the
| users will have a hard time selectively blocking the layout
| scripts from the ad/spyware most contemporary (web)software
| development is about.
|
| This led us to the point where saving (or God forbid printing!)
| an article needs a lot of effort in many cases.
|
| My observation is: when I need to go to work on the field, I
| need printed documents. Printed documents don't need firmware
| updates, their batteries don't run out, and no, I don't need
| interactivity in documents.
|
| Self contained HTML is a good - and necessary - step, but
| interactivity and executable code is not something we usually
| need in documents, I only saw somewhat legit need for it on
| corporate abomination of documents (and some teaching materials
| possibly).
| morelisp wrote:
| Some of us are equally nonplussed by modern web dev but still
| quite miss PostScript.
| jxdxbx wrote:
| Thank you. It's really frustrating that people want to make
| documents as unreliable and annoying as the web.
| jxdxbx wrote:
| I understand all this but there needs to be a simple format for
| just regular books without all this complication. I thought
| that's what ePubs were for. What I want it basically an ebook
| format that is mostly zip files of plain text.
| mr_mitm wrote:
| What are you missing in epub?
| jxdxbx wrote:
| Simplicity? A guarantee that the file will be readable in
| 20 years? Project Gutenberg still treats plain text as the
| default format for a reason.
| velcrovan wrote:
| The ePub standard is 17 years old, it consists of HTML
| which is 31 years old and CSS which is 27 years old,
| packaged in ZIP format which is 34 years old, and all are
| still in widespread active commercial use and very easy
| to write parsers for. I think you'll have problems with
| the physical media you use to store your plain text files
| before you ever have problems finding software to read
| ePub file contents.
| larme wrote:
| It's just a fucking book. Don't push your shits like js or SPA
| or d3 or webgpu to a fucking book. I just want to read it like
| a dead tree book.
| mr_mitm wrote:
| Not all documents are books. And I'd appreciate it if you
| stated your criticism in a more civilized manner.
| simonw wrote:
| Here's some really insightful feedback on this idea from Baldur
| Bjarnason, who has spent significant time working with various
| W3C groups relevant to EPUB:
| https://toot.cafe/@baldur/111819472053623911
|
| Example note: "EPUB originally didn't support remote resources
| and people put a lot of work into changing. Loading stuff over
| the network is HTML's killer feature. Blocking network assets is
| a setback for format adoption, not progress."
| starkparker wrote:
| Oh for Christ's sake, someone pry Baldur off his cross again.
|
| > Blocking network assets is a setback for format adoption
|
| People are standing here telling him that _allowing_ network
| assets _is a setback for format adoption_ and he's just going
| to keep pounding this stupid, obnoxious drum of his until he
| runs everyone off.
|
| > almost all of the problems described would be solved by
| getting OS vendors (Google, MS, Apple) to invest more money in
| EPUB
|
| That's back-asswards. Google, MS, and Apple don't give a shit
| about EPUB, they never will, and it's arguable that we're
| better served with them not buying a seat at that table
| considering how poorly their "help" has helped web standards,
| as much of the rest of his dismissive thread helpfully notes.
|
| If he wants money for EPUB standards he should shake the cup at
| IDPF members who rely on it, and particularly Amazon, to whom
| he quite vocally abdicated the publishing space 12 years
| ago.[1]
|
| Barking at operating system companies is nonsense at best and
| how we wind up with another, even more avoidable situation
| where the space is held hostage by them at worst. At least
| Amazon can chuck some goodwill money at EPUB development while
| continuing to kick its ass up and down the market with MOBI.
|
| (Aside from all this, his dismissals of the "clunky" reading
| system complaint, citing how EPUB has "too many divergences",
| only further proves to me how tunneled the vision is of the
| people involved. To hell with forking or improving EPUB, then,
| because it can't be improved if that's the attitude of the
| people most involved with or influential within it. What bloody
| point is there in the customizability of a format that _nobody_
| can effectively build tools for or consume?)
|
| 1: https://www.baldurbjarnason.com/notes/amazon-wins/, in which
| he also admits that he has no idea how to work with IDPF, which
| is a really great sign of how long things have been going this
| badly in this space
| sirsinsalot wrote:
| I also don't want my e-reader phoning home (to
| publisher/author) read time and page turns because the EPUB
| loads a pixel.
| Someone wrote:
| > Google, MS, and Apple don't give a shit about EPUB
|
| https://www.w3.org/groups/wg/epub/former-participants/
| certainly shows Google and Apple participated in the working
| group. Apple also has a book store selling EPUB books and has
| EPUB readers for both MacOS and iOS. Google also has an app
| that handles EPUB.
| crabmusket wrote:
| I really enjoyed this response, though I feel your points
| could have been made with a little less personal vitriol. Do
| you have history with Baldur? I don't ask because I think
| that would undermine your arguments, I'm just interested why
| you had such a strong reaction.
| mft_ wrote:
| > > Blocking network assets is a setback for format adoption
|
| > People are standing here telling him that _allowing_
| network assets _is a setback for format adoption_ and he's
| just going to keep pounding this stupid, obnoxious drum of
| his until he runs everyone off.
|
| I don't know the background that you're frustrated about, but
| I'd suggest that the answer might be: 'it depends' - and it
| depends on the intended purpose of the format in question.
| PDF is self-contained, and can be read (mostly) reliably on
| almost any device with the right software; PDFs having to
| have internet access to be read or opened would be a bad
| thing; further, the same goes for most formats - including
| EPUB (as you say) and audio files, picture files, etc.
| chasil wrote:
| > PDF is self-contained, and can be read (mostly) reliably
| on almost any device with the right software
|
| Article: "A PDF is a single file that contains all the
| images, fonts, and other data needed to render it."
|
| This is only true if you are using PDF/A, or have
| explicitly bundled all of your fonts in some version of the
| PDF standard.
|
| Otherwise, 14 total typefaces must be rendered by the
| viewer. These 14 are: Times (in regular, italic, bold, and
| bold italic), Courier (in regular, oblique, bold and bold
| oblique), Helvetica (in regular, oblique, bold and bold
| oblique), Symbol, and Zapf Dingbats.
|
| The 14 standard typefaces can vary between viewers:
|
| https://en.wikipedia.org/wiki/PDF#Text
|
| "...the base fourteen fonts... or suitable substitute fonts
| with the same metrics, should be available in most PDF
| readers, but they are not guaranteed to be available in the
| reader, and may only display correctly if the system has
| them installed."
|
| Depending upon what the viewer bundles, PDFs using these 14
| might not render as expected.
|
| Below is a deeper discussion from the wiki:
|
| https://web.archive.org/web/20110718231502/http://www.plane
| t...
| dsr_ wrote:
| It would be quite nice if Firefox opened EPUBs properly
| instead of requiring the just-good-enough EPUBreader add-on.
|
| I'd value that a lot more than Pocket (which I always turn
| off).
| the_lucifer wrote:
| I will go so far after argue that Apple is only one of the
| major vendors actually adopting EPUB books.
| zaphirplane wrote:
| Allow loading of network resources is not good for security.
| Surprised this isn't a worry, tbh didn't read baldur's writing
| criddell wrote:
| But it's good for tracking what books people are reading. If
| the history of the internet shows anything, it's that if
| surveillance is possible, it will eventually happen.
|
| Lots of governments around the world would love to know what
| their citizens are reading. Few would be bold enough to go
| after this directly, but if some company operating in their
| country has the data then there's a path for that government
| to get the data.
| watwut wrote:
| Just about the last thing I want is for epub to stop working
| offline on my phone, because the damm book needs to download
| something.
| jxdxbx wrote:
| My view as a heavy ebook reader: ePubs should be inert data. No
| javascript, no interactivity, no network resources. Just a
| fancy text file with some appearance settings all of which the
| reader can override.
| harshreality wrote:
| Think about what javascript exclusion means, and all the
| things a good universal ebook format needs to support.
| _Nicely_ rendered math? Currently the best option to do that
| is embedded mathjax (maybe you could pre-build mathml and
| ship that, but I 'm not sure that covers all cases). Graphs
| or charts? There are nice js libraries for that, while doing
| it manually means exporting images or svgs. Even static svgs
| are annoying and brittle to font-size changes without
| javascript to adjust the svg size appropriately.
|
| Don't confuse what's necessary for standard fiction books
| with what the format should support.
|
| JS and interactivity are fine, in technical books, reports,
| or niche fiction.
|
| What I absolutely agree on is that epubs don't need is
| networking. Resources on the internet get stale after years
| or decades anyway, so inclusion of any network assets into an
| epub guarantees that the work will degrade over the years.
| References can be web links, but nothing from the internet
| should be embedded.
| criddell wrote:
| EPUB 3 includes MathML.
|
| https://www.w3.org/TR/epub-33/
| bmacho wrote:
| > "EPUB originally didn't support remote resources and people
| put a lot of work into changing. Loading stuff over the network
| is HTML's killer feature."
|
| And it is a feature of books that they stay the same.
|
| Both can be a feature, the ability to change (e.g. they fix
| something in the cloud), and the disability to change (e.g. you
| can have it as you bought it).
| criddell wrote:
| I don't get the impulse for homogenization everywhere. PDFs,
| EPUBs, Word documents, HTML documents, etc... all have
| different strengths and weaknesses and I think that's a good
| thing. Never needing an internet connection is a strength of
| EPUB IMHO.
| reacharavindh wrote:
| The thing I wish the most with epub or technically the epub
| readers is the ability to scribble and hand write notes in them
| using a stylus and for them to keep them while reading again. I
| do that with PDFs on my iPad, but have a lot of tech books for
| which I took manual notes nowhere to be found again - even if I
| did, they are not inline with what I was reading and thinking.
| eviks wrote:
| In general, the modern docs should be easily editable, not just
| allow annotations, since it's easy to preserve the original
| content/layout
| beckerdo wrote:
| I agree, I would like an ePub to have a robust note taking and
| exporting ability.
|
| For instance, if I highlight in Chapter 8 "In 539" [next
| paragraph] "Belisarius" [next paragraph] "marched on Ravenna"
| [10 paragraphs later] "In 540 Belisarius entered Ravenna".
|
| I would like to export this with the Chapter header and
| detailed highlight locations OR just as one sentence with
| subtle links to the locations.
| AlanYx wrote:
| I'm on the same page. I convert all my ePubs to PDF because I
| want to keep my handwritten annotations in-place alongside the
| text I'm annotating, including things like circled words.
| Recent Kobos (Elipsa and Sage) take a decent stab at solving
| this problem while retaining the ePub format and
| reformattability, but it breaks too easily.
| eviks wrote:
| Commendable effort of trying to get rid of the ancient paper-
| based legacy in the digital world that is PDF
|
| Though I'm curious whether the clunky old-but-still-living HTML
| (especially in its ugly XML variety) + CSS are the right
| foundations for the portable format of the future? Since the
| author has also developed the whole new document language would
| be nice to read a more in-depth overview on that subject. Or why
| limit to the ugly duckling of JS in the future when WASM exists?
|
| > content by pages. Instead, we should mandate a consistent
| numbering scheme for block elements within a document, and have
| people cite using that scheme.
|
| that's indeed the proper and more precise approach, though we
| could still have those "fixed layout epub" pages as a backup
| coordinate system
| thayne wrote:
| > A PDF is a single file that contains all the images, fonts, and
| other data needed to render it.
|
| A PDF _can_ include the fonts. But it often doesn 't, and relies
| on system fonts. One reason for that is because including fonts
| in the PDF can dramatically increase the size of the file. In
| some cases a single font could be larger than the entire rest of
| the file. I've also worked on implementing embedding fonts in
| some software that generated PDFs. It was surprisingly difficult
| to figure out how to get it to work reliably.
|
| > PDFs are rendered consistently.
|
| Not as much as you would think. There are several cases where the
| same PDF will render differently depending on which PDF viewer
| you use. Usually the differences are pretty subtle, but
| occasionally there are edge cases that result in pretty
| significant differences. I've even run into a case where the same
| version of Acrobat reader will render a PDF differently depending
| on what OS you are using.
| EE84M3i wrote:
| Is there software that minimizes the fonts by removing code
| points that aren't used in the document?
| adrian_b wrote:
| This is a standard feature of the PDF format.
|
| Normally all PDF documents include only the glyphs
| corresponding to the code points actually used in the text
| rendered with that font.
|
| That is why you can go for instance to any site of a vendor
| of fonts and you can download freely a PDF sample text of an
| expensive font. You can easily extract the font from the
| sample PDF, but it will be useless, as it will contain only
| the few letters that had been used in the sample text.
| geraldhh wrote:
| > A PDF can include the fonts. But it often doesn't, and relies
| on system fonts.
|
| found this out, after 20-something years of consistent pdf
| renderings, in a job interview because my docs allegedly looked
| odd :/
|
| the daily wtf ...
| jxdxbx wrote:
| Yeah, MS Office PDF generation (at least some time ago) did
| not generate PDFs with embedded fonts, and I'd often come
| across weird-looking documents where the system is using a
| font with the wrong characteristics. Print-to-PDF usually
| avoids this.
| pseingatl wrote:
| The Arabic glossary of legal terms distributed by the State
| of California is unreadable unless you open the file in Adobe
| Reader, search for the name of the font used, download and
| install the font on your system, close the file and reopen
| it. I suppose there are many instances of this happening.
| adrian_b wrote:
| There are only a few standard system fonts that can be omitted
| from a PDF file and the document assumes that whatever fonts
| will be used for rendering match in metrics the traditional
| Times, Helvetica, Courier, etc., typefaces. Therefore with
| compatible system fonts there should be no changes in the
| layout of the rendered document. There are of course examples
| of system fonts which are advertised as compatible in metrics
| with the ancient Adobe PostScript fonts, but which nonetheless
| have subtle differences.
|
| Except for the small number of standard system fonts, for the
| other fonts the PDF document normally includes only a small
| subset of their glyphs, corresponding to the characters that
| are actually used in the text that is to be rendered with that
| font.
| teekert wrote:
| As someone who occasionally tries to read scientific literature
| on their e-reader, which is nice, I can just mail it to my
| PocketBook account and it shows up, I have a deep hate for PDF.
| Please let this be a popular thing.
| mrich wrote:
| Ironically this did not render in Firefox on Android (just the
| spinner kept spinning) Worked in Chrome.
|
| That said, epubs are great for reading books on mobile. The
| advantage for pdfs is that they contain highlights/notes, so you
| can directly import them into Zotero and all your annotations are
| there. For epub, you have to hope there is a way to export the
| annotations that are stored by the reader app, and then you have
| to process them further. Readera is a great reader for mobile
| that makes this possible. I'm currently working on a script that
| will convert an epub to pdf, extract the annotations from
| Readera, and mark them in the pdf. Then I can import the pdf into
| Zotero, while still retaining the great reading experience of
| epubs.
| Symbiote wrote:
| Works fine in Firefox for Android 122.0 for me.
| mrich wrote:
| Also loads instantly for me now, didn't make any changes.
| zozbot234 wrote:
| There is a Web Annotation standard that could be used to export
| the notes to.
| staz wrote:
| It is working for me on my Firefox on Android.
|
| One of the nice benefits I can already experience in his
| document it the working TOC sidebar which allow navigation in
| the document. (Compared to classical HTML not PDF)
| mwilliamson wrote:
| I had a similar problem loading the page on Firefox for desktop
| with private browsing. It turns out service workers don't work
| in private browsing, which it seems Bene (the software
| rendering the page) requires. Switching to a normal Firefox
| window solved the problem.
| DeathArrow wrote:
| I didn't know that EPUB is based on HTML. I always had the
| impression that it has its own binary format.
|
| Using HTML as a base has a lot of sense.
| simongray wrote:
| W3C standards basically always build on top of other existing
| W3C standards.
| anthk wrote:
| It's just a zip file. Under Linux/Mac/BSD you can trivially
| write a script which unzip's and outputs the ebook's HTML files
| into a large text stream and that output can be used as the
| input of a text mode web browser, allowing you to read ebooks
| everywhere with just two lines of code.
| arp242 wrote:
| It's just a zip file with HTML documents and some (ePub-
| specific) XML files to define metadata, chapters, and a few
| things like that. I use this "epub-edit" script to edit them:
| #!/bin/zsh # # Extract epub file to a temp
| directory, launch shell to edit it, and re-zip # it.
| Nothing about this is really epub-specific as such. echo
| " $@" | grep -q -- ' -h' && { sed '1,2d; /^[^#]/q; s/^# \?//;'
| "$0" | sed '$d'; exit 0; } # Show docs [
| "${ZSH_VERSION:-}" = "" ] && echo >&2 "Only works with zsh" &&
| exit 1 setopt err_exit no_unset no_clobber pipefail
| full=$1:a tmp=$(mktemp -d) bsdtar xf $1 -C
| $tmp cd $tmp print "Editing $1; press ^D to
| exit" zsh ||: mv -f $full $full.orig
| zip -f $full * cd - rm -r $tmp
|
| And then I use vim to edit the HTML files and such.
| emayljames wrote:
| The download of the page epub dispays out of the viewport on
| google books app.
|
| Bene seems to be in alpha stage.
| diebeforei485 wrote:
| I personally think PDF's are a terrible legacy format with
| unnecessary complexities[1] and most uses of PDF's do not involve
| printing so the typesetting arguments don't make sense to me. For
| the vast majority of use cases it's far more important to be
| readable on phone, tablet, and computer.
|
| I was surprised when the author mentioned iBooks doesn't support
| scrolling view, so I tried it myself. Turns out iBooks on macOS
| does not support scrolling for ePub files, but it does on iOS and
| iPadOS. Very strange decision by Apple.
|
| 1. https://googleprojectzero.blogspot.com/2021/12/a-deep-
| dive-i...
| baq wrote:
| but but but... if I really need to print something, a PDF is
| the most reliable times portable route. I guess a multipage svg
| would work, too, maybe, if exported to a pdf to properly print
| multiple pages first. (Looking at you, inkscape...)
| adrian_b wrote:
| PDF is an annoying specification, but there exists absolutely
| no replacement for it.
|
| I have never seen any kind of technical documentation published
| in any other format than PDF that is comfortable for reading
| and searching, even when that is done on a mobile phone.
|
| I do not want a document that changes appearance depending on
| the device used for reading or depending on its temporary
| state, like window size. I want a document whose layout has
| been well conceived by its author and which is fixed,
| regardless of what I happen to use for reading it.
|
| When I happen to read it on a smaller screen or window, except
| for trivial text-only documents, I do not want changes in
| layout, but I only want a smart reader, with comfortable means
| for fast zoom and pan, and which does not have stupid behaviors
| (like some Android readers), for instance where scrolling
| vertically (including Page Up/Page Down) also moves the
| document horizontally (preventing the easy reading of a column
| of text).
|
| The traditional recommendations for the maximum width of a text
| column are good enough, if observed, to ensure comfortable
| reading even on a mobile phone. Only when the author breaks the
| traditional typographic rules by making extra-wide columns, the
| reading on a mobile phone becomes inconvenient.
| broscillator wrote:
| I find reading PDFs on my phone and even on my kindle really
| uncomfortable.
|
| On my phone I have to either zoom in or turn on landscape
| mode (which usually means turn it on globally, I can't do it
| _just_ for the reader app).
|
| On kindle, a full page has too small font due to so much
| margin, and fitting the width shows me 80% of the page, and
| then I have to scroll down for the last 20% and my eyes have
| to find where exactly I was reading.
| baq wrote:
| I'm keeping my not-sure-how-old iPad 5 around specifically
| because it's _the_ device form factor to read pdfs.
| broscillator wrote:
| That kind of highlights how non-versatile PDF is despite
| some comments.
|
| However it does sound handy, I kinda want a dedicated
| tablet for sheet music.
| baq wrote:
| You're absolutely right PDFs are super rigid, but that's
| kinda their point - so with the proper device, like a
| sheet of paper or a 10+ inch tablet screen it makes
| sense.
|
| Would I prefer more content to be reflowable etc.? Yes -
| but with a tablet it isn't strictly necessary, just nice
| to have.
| rchaud wrote:
| It's plenty versatile. Not everything needs to be phone-
| friendly. Phone screens weren't designed for reading PDF-
| size documents. Even so, options exist to reflow the
| text, view in landscape or pan and zoom.
| broscillator wrote:
| There is one device which fits PDFs well, an ipad. It can
| be fairly awkward on laptops and deskptops as well.
|
| > view in landscape or pan and zoom.
|
| This is awkward, that's the issue I mentioned above, how
| annoying is to have to do that if you're reading for a
| 30-60 minute session.
| pseingatl wrote:
| Or the Kindle DX, RIP.
| crabmusket wrote:
| > I have never seen any kind of technical documentation
| published in any other format than PDF that is comfortable
| for reading and searching, even when that is done on a mobile
| phone.
|
| Can you provide an example of what you mean? My experience is
| completely the polar opposite.
| adrian_b wrote:
| I refer to something like a 3000 page manual of some
| microcontroller, or the datasheets of some integrated
| circuits or the specifications of some Arm architecture
| variant, or the standards for some programming language,
| e.g. C++ or System Verilog.
|
| These are concrete examples of documents that I might have
| read during some flights or when waiting for some flight,
| on a smartphone.
|
| When reading something like a fiction novel, reflowing the
| text based on the window width may be acceptable.
|
| On the other hand, the navigation through a huge document
| half of which are tables, figures, diagrams, schematics and
| graphics is extremely painful when it is in HTML format so
| the layout changes based on the device and window used and
| there are no means to jump quickly e.g. to page 1436, then
| to page 2117. When zoom, pan and scroll are correctly
| implemented, which unfortunately happens seldom, they are
| much less distracting than the random changes in page
| layout caused by rendering as done by a browser.
|
| I strongly dislike whenever a company provides only a Web
| documentation that is hard to navigate, instead of also
| providing a PDF manual.
|
| Web documentation may be acceptable for very small
| documents, but not for most of the current technical
| documentation, where many thousands of pages for a manual
| are common.
|
| Perhaps an EPUB format extended with everything necessary
| to completely describe a fixed page layout might become
| competitive with PDF, but I will have to see an example to
| believe it.
|
| For now, whenever I see a book or any other document both
| in PDF and in EPUB formats, I always choose the PDF
| variant, because without exception it provides a better
| quality of the rendered pages.
| crabmusket wrote:
| I accept your points and agree that the kind of
| documentation you're thinking about sounds like a poor
| use case for HTML/EPUB. I do not regularly encounter this
| sort of documentation.
|
| I've been boosting the idea in the OP, but more for
| things like "your local council's meeting minutes" or
| "your English class assignment" or "a research paper".
|
| Though I do want to point out that even moderately
| complex specs, when designed for the web, can work well.
| For example, the HTML spec doesn't reference page
| numbers, but has extensive internal hyperlinking:
| https://html.spec.whatwg.org/
|
| > Perhaps an EPUB format extended with everything
| necessary to completely describe a fixed page layout
| might become competitive with PDF
|
| I highly doubt this will ever happen, for use cases which
| require fixed layout. But there are plenty of use cases
| where fixed layout is unnecessary and inferior.
| lxgr wrote:
| I work with the same type of documents regularly, and I'd
| give up both exact referencing and stable rendering in a
| heartbeat in exchange for something reflowable that I can
| reliably search in and copy paste from.
| adrian_b wrote:
| The PDF documents allow reliable search and copy/paste,
| but unfortunately only when the author of the document
| has taken care to ensure this. Nevertheless, this usually
| happens automatically when the PDF has been created by
| exporting a document created with some Office suite,
| unless the author has changed the default options to
| forbid these features.
|
| Even many of the PDFs created by scanning printed
| documents allow reasonably reliable search/copy/paste, if
| they had been processed by an OCR.
| lxgr wrote:
| > The PDF documents allow reliable search and copy/paste
|
| Are you sure about that? As far as I understand,
| extracting text from an ultimately vector-graphics-like
| PDF heavily depends on ORC-like heuristics on the PDF
| consumer's side.
|
| The ToUnicode mapping table can help with the glyph-to-
| codepoint mapping aspect of this, but figuring out the
| difference between the gap between two letters and two
| words seems hard.
|
| I've seen bothtypesofissues mentioned in the following
| article i n t h e p a s t, including in a specification
| document I use multiple times per day for my job:
|
| https://web.archive.org/web/20220328102205/https://filing
| db....
| adrian_b wrote:
| I did not look at the details of the PDF specification,
| but I have heard that there are indeed many cases that
| can confuse a PDF reader which wants to find or copy a
| text string.
|
| Nevertheless, I have been using very frequently every day
| for many years search and copy + paste from PDF documents
| without any problem. I usually prefer to use mupdf as the
| PDF reader, because it is very fast (it also works better
| as an EPUB reader than the other EPUB readers that I have
| tried), but there are some seldom-encountered PDF files
| that mupdf cannot parse, in which case I fall back to
| other PDF readers, e.g. okular.
|
| The only case that I encounter when search/copy/paste
| does not work is in scanned books that have not been
| OCR'ed, so they contain only bitmap images of the pages,
| without text.
|
| The problems mentioned at your link are caused mostly by
| the PDF specification being too permissive, which allows
| abuses like using a non-standard character encoding
| coupled with the use of a non-standard font. However,
| this specific type of abuse could not be prevented by any
| specification without using some sort of AI to decide
| whether the glyph used for a character encoded as Unicode
| "A" is really a kind of "A".
|
| Among the problems enumerated at your link, I have
| encountered a few times the case when there are thin
| spaces inserted between each letter of a string. In such
| a case it is annoying to remove those spaces after
| pasting the text in another document, but this is
| something that I have seen only very rarely.
| Shorel wrote:
| And between PDF and EPUB, I always choose the EPUB
| variant, because in my laptop it definitely looks better,
| with the text the right size and sane pagination.
|
| I don't jump to page 2112, I use the table of contents to
| jump to section 3.1.2, which is as fast if not faster.
| jxdxbx wrote:
| people don't want fixed layout documents only for printing.
| they want them because they want to fix the layout of their
| documents more than they care about small screens.
| crabmusket wrote:
| Those people are welcome to continue using PDFs, and I really
| hope that in some utopian future that they will receive a lot
| of requests from their readers along the lines of 'can I
| please have a portable epub version too?'
| BlueTemplar wrote:
| Not sure about this specific case, but I suspect at least some
| of these readers might do it for consistency with e-paper
| devices, where no scrolling is an hardware limitation (very low
| refresh rates, weak processor, battery savings).
|
| So it seems to be a bad idea to try to have a one-size fits all
| standard : we're much better off with two digital document
| standards : one with full multimedia and interactive
| capabilities (short of networking), and another, a subset of
| the previous with the limitations like : monochrome, no
| multimedia, interactivity mostly limited to (still in-document)
| hyperlinks...
|
| And guess what, we already have two formats that are _almost_
| there ! HTML (see also MHTML=EML) and EPUB.
|
| (And of course a 3rd one for physical archival and the rare
| digital fixed layout documents, for which PDF/A already seems
| to be decent enough.)
| czierleyn wrote:
| When the IDPF merged with the W3C a couple of years back they
| tried to develop a new standard called PWP, Portable Web
| Publications, which was supposed to be a new 4.0 version of EPUB,
| as far as i know. But there was much resistance from the
| publishing community and the project was shelved a couple of
| years ago.
|
| See: https://w3c.github.io/dpub-pwp/publishing-
| snapshots/FPWD/Ove...
| watwut wrote:
| > PDFs are rendered consistently. A PDF specifies precisely how
| it should be rendered, so a PDF author can be confident that a
| reader will see the same document under any conditions.
|
| And that is why PDF sux for reading on the phone. And why epub is
| massively better if you want to read articles and books.
| geokon wrote:
| It feels like a bit of doomed a project simply bc browsers don't
| open EPubs. You can link a PDF and while it's a bit of a context
| switch, the browser will open and display it
|
| Since as described EPubs are basically HTML its kinda dumb
| browsers don't open them - but good luck convincing the
| Chrome/Mozilla bureaucrats
|
| I think another discouraging aspect is HTML CSS are so huge and
| bloated at this point that few people can implement a "reader"
| for EPUB/HTML. It's basically "go implement a new browser". It
| makes one think a easy-to-parse markdown (like Djot) with some
| extra rendering bells and whistles would be a more likely long
| term solution
|
| My personal interim compromise solution is embedded everything
| (CSS, svgs, scripts and base64 images) into an HTML file. It's
| similar to an EPUB. It's a bit bloated and ugly but with a bit of
| care it works and naturally browsers (and by extension basically
| every user) can open it
|
| Unfortunately a user has no way to really know "oh I can download
| and store this web page offline". It'd be nice to have some thing
| like a .htmls extension that indicates it's an HTML but it
| doesn't have any external resources.
| zerof1l wrote:
| > It feels like a bit of doomed a project simply bc browsers
| don't open EPubs.
|
| Not that long time ago, browsers could not open PDFs as well.
| Now all browsers come with PDF reader written in ASM/JS. I see
| nothing that prevents browsers doing the same for EPUBs. There
| exist browser extensions that do exactly this already. Its a
| matter of EPUB format gaining popularity.
| geokon wrote:
| I think Google is actively against offline data. It's not
| aligned with their business interests
|
| My mental analogy is, you can also have offline apps on
| Android. You can specify this in app manifest. But internet
| access isn't exposed to the user as a permissions.
|
| Like the author says, Google already injects online fonts
| into the EPubs they generate. Meanwhile PDF is a battle
| they've already lost
| BlueTemplar wrote:
| Indeed, and I would add that there's no reason for browser
| to be able to open PDFs : this sounds as yet another
| attempt for Google to wrestle with Microsoft over the
| control of the OS (by having everything happen in the
| browser instead).
|
| And also probably why we _still_ have to rely on 3rd party
| hacky browser extensions to be able to save web pages as a
| single file.
| Shorel wrote:
| > Like the author says, Google already injects online fonts
| into the EPubs they generate.
|
| I didn't notice that before, but now I will actively avoid
| Google generated EPUB files.
| rchaud wrote:
| Epubs usually are not accessed/downloaded in a browser. PDFs
| definitely are, as they are freely shared online, whereas
| epubs are usually DRM'ed and not freely shared.
| leoedin wrote:
| > I think another discouraging aspect is HTML CSS are so huge
| and bloated at this point that few people can implement a
| "reader" for EPUB/HTML. It's basically "go implement a new
| browser". It makes one think a easy-to-parse markdown (like
| Djot) with some extra rendering bells and whistles would be a
| more likely long term solution
|
| This feels like the biggest hurdle to me. The author says
| "Portable HTML generation principle: when possible, systems
| that generate portable EPUBs should output portable HTML.". I
| don't think this is going far enough. If the goal is for this
| format to be everywhere and repeatable then it needs to be
| standardised and easy to implement a new rendering engine.
| Relying on webviews doesn't feel like the way forward. The
| beauty of PDF is that it is incredibly reliable - a PDF from a
| decade ago still renders the same today as it used to.
|
| I suspect if an effort like this is to get off the ground, the
| scope of the document needs to be scaled right back. The subset
| of XHTML allowed should be very limited. The ability to render
| a document that looks the same everywhere should be prioritised
| - fixed layout at a fixed page size first, reflowable second.
| It needs a standard with a comprehensive test suite of
| documents + render outputs.
| crabmusket wrote:
| > The ability to render a document that looks the same
| everywhere should be prioritised
|
| IMO actually this is the question the whole effort hinges on.
|
| If the goal is to replace PDF for the uses that require
| pixel-perfect rendering on every client just as the designer
| intended, then this approach is dead-on-arrival.
|
| But if that's not the goal, then that has to be extremely
| well-communicated by the project, so that people who need
| that know they need to stick with PDF. Indeed, the project
| needs to explicitly say that it's _not_ a goal, and that
| clients _should_ be free to make reasonable rendering
| decisions within certain specified bounds.
| jbverschoor wrote:
| You know.. HTML used to be hyper _text_. Some links. Add some
| figures /images, tables.
|
| But then we 'needed' magazine-like design/layout. Still, it was
| document based, so actually pretty good.
|
| After that, we tried to shoehorn HTML to an application
| distribution platform. Current css layouts are (finally) more
| like traditional layout engines for applications.
|
| The last 20 years i.m.o. was pretty much a waste of effort
| because there was no proper way to distribute (cross-platform)
| applications, well besides java...
| foofie wrote:
| > You know.. HTML used to be hyper text. Some links. Add some
| figures/images, tables.
|
| It still is.
|
| > But then we 'needed' magazine-like design/layout. Still, it
| was document based, so actually pretty good.
|
| Styling is not handled by HTML. It's a separate concern
| assigned to CSS. For convenience HTML offered default
| styling.
|
| > After that, we tried to shoehorn HTML to an application
| distribution platform.
|
| It's not shoehorned. It's the use case: render documents. A
| document is a tree of ui elements. It's the same with GUI
| frameworks like Qt or WPF.
| mapreduce wrote:
| > "It still is."
|
| It still is but you are missing the point of the thread.
| HTML still is hypertext, some links, some images, some
| tables. No doubt about that. But HTML is also so much more
| than that. The spec is a beast. Anyone who wants to
| implement an HTML based reader has a mammoth task in front
| of them. It's like "go implement a new browser" like
| someone said in this thread above.
|
| > "Styling is not handled by HTML. It's a separate concern
| assigned to CSS."
|
| Missing the point again. We know styling is not handled by
| HTML. The point of the thread was to tell how big of a task
| it is to create your own HTML based reader. If you want to
| create your reader like it or not you have to implement
| support for CSS too and that too is a mammoth spec.
|
| So our only options are: A. Go implement a new browser. B.
| Use something like Webkit. C. Implement a small subset of
| the HTML and CSS specs.
| foofie wrote:
| > The spec is a beast. Anyone who wants to implement an
| HTML based reader has a mammoth task in front of them.
|
| That's true for basically any non-trivial document
| rendering format. For example, take a look at the PDF
| spec. Even basic things like parsing the document format
| is a formidable task. HTML in comparison is a trivial
| format. The same goes for technologies like TeX or even
| Microsoft's own Word format, which Microsoft famously had
| lots of problems supporting. It is a hard problem for all
| formats, not just HTML.
|
| > The point of the thread was to tell how big of a task
| it is to create your own HTML based reader.
|
| You're confusing some things. A document format is one
| thing, but a renderer with specific capabilities is an
| entirely different thing. You're commenting on the
| document format and the styling and layout system, and
| now you're shifting the conversation to what it takes to
| implement a renderer.
|
| Debates on document formats are entirely separate and
| orthogonal to debates on how to implement renderers.
| Renderers for the most trivial things are tremendously
| complex. There are a myriad of good reasons why we're
| seeing GUI frameworks built on top of webviews in spite
| of all the complains about the formats that webviews
| support, and in spite of the myriad low-level rendering
| frameworks already available.
|
| To understand the poiny, try to think through the
| requirements list to implement a renderer for Markdown.
| It's a document format with a half dozen of features.
| Would you call it trivial?
| mapreduce wrote:
| > You're confusing some things. A document format is one
| thing, but a renderer with specific capabilities is an
| entirely different thing. You're commenting on the
| document format and the styling and layout system, and
| now you're shifting the conversation to what it takes to
| implement a renderer.
|
| If you follow the comment you replied to the discussion
| was about implementing a renderer. So no, I am not
| shifting the conversation to implement a renderer. The
| conversation _is_ about implementing a renderer. That it
| is incredibly difficult to do today with the modern specs
| is the point.
| math_dandy wrote:
| What about option D. Use a WebView. This is exactly what
| the author did. The point of the proposal is identify
| which features of a WebView can be used (and which must
| not) if the goal is to produce nice text layouts in
| multiple form factors. But the rendering of HTML and CSS,
| and the execution of Javascript are solved problem.
| wharvle wrote:
| > Styling is not handled by HTML. It's a separate concern
| assigned to CSS. For convenience HTML offered default
| styling.
|
| It in-fact was, and to some degree still is. I assure you
| we achieved a hell of a lot of styling before css existed,
| and for some time after it did but before most of us were
| using it (much), using features of HTML, some of which were
| _explicitly_ there to support styling.
| eviks wrote:
| There is an singlefile extension that can save a page in a
| single self-extracting zipped html where you don't need to
| waste base64 anything, and can unzip to a folder and view
| images as is without the page
| foofie wrote:
| > I think another discouraging aspect is HTML CSS are so huge
| and bloated at this point that few people can implement a
| "reader" for EPUB/HTML. It's basically "go implement a new
| browser".
|
| I don't think that's true. In the very least, you can use a
| WebView and feed it regular HTML. If the whole industry uses
| webviews for GUIs, it's hardly a stretch to use one to render
| Epub docs.
| baq wrote:
| > It feels like a bit of doomed a project simply bc browsers
| don't open EPubs.
|
| I guess that's why the article is actually an epub opened with
| a WASM epub viewer :)
| jasomill wrote:
| _It feels like a bit of doomed a project simply bc browsers don
| 't open EPubs._
|
| While browsers don't provide a convenient UI for opening EPUBs,
| they should have no problem rendering the chapter HTML files
| contained within.
|
| In the absence of browser support, writing a server-side EPUB-
| to-browsable site proxy that adds chapter navigation controls
| and simple layout options shouldn't be too difficult.
|
| Incorporating the necessary DRM support required to view the
| majority of commercial ebooks through such a proxy would very
| likely be legally problematic, of course.
|
| Come to think of it, any form of publisher-approved DRM EPUB
| browser support sounds like it'd be about half a technical step
| away from DRM support for web pages in general, which is a
| horrifying prospect.
| Shorel wrote:
| > It feels like a bit of doomed a project simply bc browsers
| don't open EPubs.
|
| If you read the article, then you just did open an EPUB.
| zvmaz wrote:
| The author is a post-doc advised by Shriram Krishnamurthi [1],
| the author of Programming Languages: Application and
| Interpretation (PLAI), and one of the authors of Data-Centric
| Introduction to Computing (DCIC). I am currently reading both
| PLAI and DCIC and I am truly delighted by the minute care the
| authors have put into making the books pedagogical works of art.
| That's true love!
|
| [1] https://willcrichton.net/
| verisimi wrote:
| > the minute care the authors have put into making the books
| pedagogical works of art. That's true love!
|
| Works of art. True love! That's very high praise.
| zvmaz wrote:
| I mean it. Both books are free, and PLAI has an interactive
| tutorial on the language used in the book called SMoL [1]
| done by one of Shriram's students. The tutorial is _not_ a
| passive one by any means; it forces you to think and
| highlights pitfalls students often fall into when reading the
| material.
|
| This whole ethos on readers learning is in stark contrast to
| books that feel like the authors show off how smart and
| clever and profound they are instead of caring about their
| readers comprehension. I include very highly praised books
| even on HN.
|
| [1] https://www.plai.org/#direct-links-to-the-tutor
|
| N.B. The author of this post, portable EPUBs, also works on
| language learning. A whole ethos...
| https://arxiv.org/abs/2401.01257
| sotix wrote:
| And the author has worked on making Rust easier to learn[0]!
|
| [0] https://rust-book.cs.brown.edu/
| morelisp wrote:
| The bar for epubs is so fucking low I have a hard time believing
| this matters at all. Just last week I bought a book set in the
| late Middle Ages which managed to transcribe all "th" as "p".
| Until publishers care about that stuff, none of these high-
| falutin technical discussions change anything.
| offices wrote:
| I don't see how this relates to the link.
| morelisp wrote:
| The file format doesn't matter one bit when the reading and
| authoring tools are shit and the editors can't/don't fix
| anything. And papers will generally have a lot fewer
| resources to deal with this than major book publishers, who
| have been epub-focused for over a decade now and actually
| make money from it.
| harshreality wrote:
| Any decent publishing or html editing tools fully support
| utf-8 by now. It's not the tools.
|
| Publisher and editor laziness may be a reason to be
| cautious about epubs _currently_ for niche or esoteric
| works, but that 's not the same thing.
|
| > I bought a book set in the late Middle Ages which managed
| to transcribe all "th" as "p". Until publishers care...
|
| The book market these days makes it challenging to do high-
| quality editing up front for republishing niche books in a
| new format. Publishers try to cut corners, outsourcing epub
| conversions to people who don't care and don't know what
| they're doing, or they OCR it, have an in-house editor (who
| also doesn't have a personal affinity to the subject) give
| it a once-over (maybe), and release it.
| BlueTemplar wrote:
| As an aside : Unicode support was still an issue in TeX
| last I checked, because most of the LaTeX tools don't
| support it (well, having been made before it was
| expected).
|
| Now, there are some attempts to fix this situation by
| Xe(La)TeX and Lua(La)TeX, but since TeX seems to be so
| much tied to PDF these days, it should probably just be
| abandoned by most scientific publishing in favor of the
| likes of GNU TeXmacs (note : it's NOT TeX in GNU Emacs)
| and HTML with MathML.
| sirsinsalot wrote:
| I agree with most of the content of this post. One of the key
| things for me would be a requirement that figures and diagrams
| which can be expressed as SVG should be.
|
| Images, limited to things that need grid-of-pixels representation
| like photographs, should be limited to that.
| livrem wrote:
| I use the SinglePage add-on for Firefox (think it is available
| for Chrome as well?) to save the current page DOM to HTML as a
| self-contained file (inlined CSS and data:-URLs for all images)
| with no dependencies and all the scripts removed etc. It is not
| perfect, and I do not trust browsers to always remain backwards
| compatible, but I prefer it to save pages as PDF or as multiple
| files.
|
| Interestingly one of the few pages I ever saw it fail on was this
| article on portable EPUBs. Guess it has too much magic going on
| to make the formatting work. The saved page is perfectly
| readable, but the style is nothing at all like what the original
| page was for some reason.
|
| I like how fbreader on Android just displays all books exactly
| the same, and as configured in the app rather than using any of
| the styling from the EPUB file. I never noticed that it tried to
| apply CSS or run scripts included in files and I hope it never
| tries to do either of those things. Loading external dependencies
| sounds like an even worse idea and I did not think that was even
| allowed.
| actionfromafar wrote:
| SingleFile, right?
|
| Edit:
|
| On that note, what's up with the Firefox Add-Ons?
|
| Currently, they are all setup so that to do something
| interesting, they need _all the permissions_.
|
| Which leads to a natural market being created, of bad actors go
| shopping for Add-Ons they can take over.
|
| Can something be done about this? For instance for this
| "SingleFile" addon. It needs to access the rendered document in
| the DOM to be able to introspect and save it all to a file.
|
| But why does it need access to _everything_? Can 't it have
| just permissions:
|
| - "snapshot DOM once"
|
| - "write to a single file"
| pbronez wrote:
| Agree. There are many extensions that I want to use on
| arbitrary websites, but rarely. This could be handled by the
| browser locking the extension out from everything, until the
| moment I manually invoke it, at which point it's allowed
| access to the page I'm currently looking at.
|
| Now, maybe that's how it already works, but I have no
| confidence in it.
| gildas wrote:
| Author here, I wished SingleFile would use less permissions,
| but it's unforntunately not possible from a technical point
| of view. Anyway, if you run some code I've written but are
| suspicious, you have to trust me or review the code which is
| open-source.
| darkteflon wrote:
| SingleFile is amazing - one of my most-used extensions on
| both desktop and mobile by far. It's elegant, unobtrusive
| and it just works. Thanks for making!
| gildas wrote:
| Thank you for trusting me and for your kind words ;)
| vertis wrote:
| I use SingleFile ALL the time. Thank you so much for it.
|
| I Perma-web anything that I find interesting, after
| discovering by going back in my notes that half the
| bookmarks I'd added no longer existed 5 years later.
|
| I don't think I could do half as good a job if it wasn't
| for your extension.
|
| I owe you a coffee/beer -- actually i just found your
| donation page, but still a drink IRL if we ever run into
| each other at a conference/etc.
| actionfromafar wrote:
| I have no qualms about you or SingleFile. I used it today,
| it's great!
|
| I just think market pressure created by the permissions
| systems is unfortunate, in aggregate. With x thousands of
| add-ons, bad stuff has happened, and is going to happen.
| Any improvement to the permissions which could mitigate
| that at least somewhat, would be nice.
| livrem wrote:
| SingleFile, yes. Thanks. Did not see it within the edit-
| window.
|
| I agree about permissions. In this case it looks like it
| needs a bit more, since it has some options like enabling
| auto-save after page-load for tabs for instance. Not a
| feature I have used, but I am sure it can be useful for semi-
| manually scraping sites.
| dotancohen wrote:
| Another comment explains why the page is difficult to parse:
| > It's a very well thought through article by the developer of
| Nota, trying to bring EPUB format up to parity with PDF. It's a
| serious start and they've already written a viewer. In fact,
| the article itself is displayed in a browser-based wasm port of
| the viewer (and looks good!).
| JadeNB wrote:
| > I use the SinglePage add-on for Firefox (think it is
| available for Chrome as well?) to save the current page DOM to
| HTML as a self-contained file (inlined CSS and data:-URLs for
| all images) with no dependencies and all the scripts removed
| etc. It is not perfect, and I do not trust browsers to always
| remain backwards compatible, but I prefer it to save pages as
| PDF or as multiple files.
|
| Ha, I think that one of my first HN comments, 10 years or so
| ago, was how I wanted to be able to save HTML web pages as
| HTML, not as PDF. I'm sure I didn't explain (or understand) my
| reasoning well, but it was roundly regarded as a ludicrous
| thing to want to do. I'm glad to hear that I was just a decade
| out of sync.
| gildas wrote:
| Actually, the first release of SingleFile is 13+ years old
| but it was less popular at the time because Chrome (it didn't
| yet support MHTML) had a negligible market share. People
| generally saved their pages in MAFF or MHTML format in
| Firefox or IE. It was when Firefox abandoned XUL extensions
| that SingleFile was able to rise from the ashes, because
| there was once again a real interest in it.
| dsr_ wrote:
| FBReader uses CSS from the document by default; you can turn it
| off in, IIRC, four stages.
|
| KOreader gives you more control but in a less friendly manner:
| in addition to choosing specific CSS from several supplied
| files, you can write your own.
|
| (Also wins for KOreader: excellent OPDS support, and easy self-
| hosted sync server.)
| Shorel wrote:
| Yes, but this article has an EPUB download button.
|
| This feature makes SinglePage unnecessary for any page using
| this system.
| Agraillo wrote:
| Funny, my misspelling sometimes also. I suppose this has
| something to do with probabilities inside our brain, the phrase
| "Single page" might be more probable than "Single file" (Hmm, I
| smell some similarities to LLM probabilities).
|
| On a side note, what is interesting with SingleFile is that
| since the file can contain anything including JS, it's possible
| to create a local "executable" (html) that uses the resources
| inside the file and does not use a single external file. I have
| an actual game-like piece that runs locally with bunch of files
| and comparatively easily transforms into a single self-running
| "program" that even runs on a mobile file manager with WebView
| support
| znpy wrote:
| ePub aren't that great either. PDF might not be perfect, but it's
| likely the best we have and the best we'll have in a long time.
|
| ePub renders wildly different on my eReader (kobo), my linux
| laptop (various apps), my iPad and my iphone. And i'm not talking
| about screen sizes, i'm talking about various elements being
| rendered largely incorrectly (and there's a matrix of
| incorrectness across implementations.
|
| PDF documents on the other hands... they render just right,
| everywhere. I have to zoom and scroll, but I will never have to
| ask myself "will I be able to actually read this document?" when
| dealing with PDF.
|
| Oh and by the way... I still print stuff from time to time. Yeah
| way less than it was needed in the past, but it's still a
| necessity. Can you even print stuff (sections? pages?
| selections?) from an ePub?
|
| > Bene is designed to make opening and reading an EPUB feel fast
| and non-committal. The app is much quicker to open on my Macbook
| (<1sec) than other desktop apps.
|
| This is elitist at best. Claiming something "is fast" on top-
| class hardware is misleading at best (if not dishonest).
|
| Try and running that on low class hardware (stuff like
| chromebooks but also laptops from at least 7-8 years ago) and
| let's see if it's still "fast".
|
| I'm not convinced.
| broscillator wrote:
| > PDF documents on the other hands... they render just right,
| everywhere. I have to zoom and scroll,
|
| To me it feels the other way around, if I have to zoom and
| scroll, they render just _almost_ right.
|
| And that almost is extremely important for actual _reading_. If
| you 're quickly skimming a PDF, sure. But to sit down and read
| for 30 minutes? One hour? Fuck zooming and scrolling. My kindle
| just displays a page, and I tap it and it goes to the next
| page. Can't really get a much better reading experience than
| that.
| znpy wrote:
| If I'm reading that long, I don't need zooming and scrolling
| on my ipad.
|
| Epub on ereaders work well but only if you're reading
| fiction. Most images and almost all tables and charts have
| been messed up by epub rendering anyway. And ereaders are
| black and white, so you're losing information anyways.
| broscillator wrote:
| right, this goes against what you said about rendering well
| everywhere, given how you mentioned your ipad. In any other
| device you will likely need to zoom and scroll constantly.
|
| Tables and charts are also a specific use case. There's no
| mention of them whatsoever on the website so one can assume
| this is talking about mainly text.
|
| In other words, you described the only time when PDF are
| more comfortable: if you have an iPad and you need to read
| charts/images/tables. Far from the claim of "they render
| just right everywhere".
| arp242 wrote:
| I have a six year old laptop and regularly read ePubs on it
| (also my PocketBook e-reader) - it's fast enough. The initial
| page calculations can take a few seconds, but this can (and is)
| cached and done in the background. It's not that bad and "time
| to something useful on screen" is more than acceptable. Large
| PDFs also aren't exactly fast by the way.
|
| Most e-readers are "low class hardware". ePubs work fine on
| most of them.
|
| In terms of performance there isn't a clear winner here; both
| can be somewhat slow at times for large documents, but are also
| "fast enough" for the common case, even on older low-spec
| hardware.
|
| I do think the general software ecosystem surrounding ePubs is
| not quite there yet, but that's mostly a matter of UX and
| "software that hadn't been written yet". As a format ePub is
| the clear winner for many (not all) scenarios. I struggle
| reading many PDFs because "zoom and scroll" that you mention is
| a right pain if you have to constantly do it (which you often
| do if you zoom text). Comfortably reader PDFs on my phone or
| e-reader is basically impossible.
| crabmusket wrote:
| > There's two ways to make progress [on document aesthetics]
| here. One is for browsers to provide more typography tools. ...
| The other way is to pre-calculate line breaks, which would only
| work for fixed-layout renditions.
|
| The third way is to develop non-browser clients, like the
| author's own Bene. While it currently uses Tauri and therefore
| the system webview, there's no reason it should always do that,
| or that another client couldn't be developed with a focus on
| typography.
|
| Want your documents to look super nice with all the kerning and
| line breaking your heart desires? Get a proper reader app.
|
| Just want the content? Sure, open it in a browser or a basic
| reader.
| BonoboIO wrote:
| Wow, I m blown away how fast this site is. It loads instant on my
| iPhone. Perfect text sizes ... really nice.
| crabmusket wrote:
| You didn't get a huge multi-second loading spinner?
| BonoboIO wrote:
| Nope. Normally i'm used to surfing the web even with
| adblockers to get slow loading times from bloated websites
| with megabytes of useless libs. This was refreshingly quick.
| upofadown wrote:
| >You might dislike the idea that document authors can run
| arbitrary Javascript on your personal computer.
|
| How I feel about this depends a lot on how much I trust the
| people who created the document and/or the person who sent it to
| me. I would trust a website I specifically selected with a
| defined TLS web of trust more than I would trust a random spam
| email.
|
| When we think about the risks associated with the complex and
| inherently insecure format known as HTML we tend to assume the
| level of trust available on the web. If we package up a bunch of
| HTML in a standalone document then we lose that assumption.
| BlueTemplar wrote:
| See also : "The decades long quagmire of encapsulated HTML"
| (2022) :
|
| https://www.russellbeattie.com/notes/posts/the-decades-long-...
|
| (Which still hasn't been posted as a "news" it seems... should I
| just submit it myself ??)
| harshreality wrote:
| I think precisely dictating layout is the wrong objective,
| although some areas (legal documents, academic papers) are still
| obsessed with that. Arxiv recently started offering a subset of
| papers in html, which is a short step from epub.
|
| If quick high-resolution referencing (page x, yth paragraph, zth
| line) is necessary, I think the way to handle it is to reference
| a phrase on that line ("the cat ran"...), which the reader can
| search for. If the search interface is lacking, that's an epub
| reader failure, not a format failure. Or, if the search option is
| considered insufficient (it does require typing with a [virtual]
| keyboard), paragraphs can be numbered--as many ancient works or
| works in translation already are, because such works have many
| editions and can't be layout-exact copies of each other.
|
| If paragraph referencing is necessary, visibly styling the
| paragraphs with numbers helps dramatically. There's no reason it
| has to be exclusive to high-profile ancient or classical works,
| like Plato [1].
|
| Classic poetry and plays, where referencing needs to be most
| exact and fast, already tend to give up any hope of everyone
| using the same edition, and simply avoid flowed text and then
| number the lines.
|
| [1] https://en.wikipedia.org/wiki/Stephanus_pagination
| foofie wrote:
| > Classic poetry and plays, where referencing needs to be most
| exact and fast (...)
|
| I believe that paragraph referencing is, by far and without any
| contest, used primarily by any text subject to reviews and
| revisions. This means technical reports and academic papers.
|
| All academic papers I subjected to review were forced to use
| templates that enforced paragraph numbering. Even though each
| version of those documents were only read by a dozen readers or
| so, all papers submitted to those journals had to use the
| template. This means hundreds of documents (see half a dozen
| revisions per paper submitted per each edition) had that hard
| requirement, and this took place for each edition of a single
| journal.
| xoac wrote:
| I love this unreservedly. It's almost an embarrassment that
| something like this does not exist already. Big thank you to Will
| Crichton for putting all of this together and actually giving
| this _idea_ a chance to take hold.
| foofie wrote:
| > It's almost an embarrassment that something like this does
| not exist already.
|
| To be fair, Epub has been largely ignored and neglected by
| everyone in the world. Virtually no reader supports rendering
| math notation, and virtually no significant publisher on earth,
| including the likes of Arxiv, offers Epub downloads. Commercial
| publishers force DRM onto every format, which excludes Epub,
| and non-commercial either stick with PDF or don't care.
| julielit wrote:
| Sorry, but this is not true. Every significant publisher
| produces EPUB these days. Most reading apps support EPUB
| (including apps from Apple and Google), Readium and EDRLab
| offer open-source SDKs that ease the development of mobile,
| desktop and Web reading software with strong EPUB 3 support,
| including MathML. Readium LCP is a DRM for EPUB, especially
| for public libraries that need an e-lending end date. More,
| EPUB is much more accessible than PDF for blind people and
| other people with disabilities. PDF has no interest for
| ebooks (but for short documents, yes).
| gmuslera wrote:
| Another standard? (https://xkcd.com/927/). It is not like the
| publishing world will switch to it overnight, a lot is tied to
| the devices they sell, so it might be not enough motivation. It
| is not simpler to convert epubs into epubs with all remote
| content embedded and relinked?
|
| Another single file book format that used to be popular many
| years ago was chm, that had its own security problems. Maybe
| adding the possibility of executing (js) code is not the best for
| something that should be mostly static, and css be used to enable
| some level of safe interactivity.
| asimpletune wrote:
| This is something I deeply care about, as I'm also very
| interested in the intersection of ebooks, security, and a LowJS
| web.
|
| 1. absolutely we should have a single-file, portable ebook
| format, and since PDF doesn't reflow text then it's not that.
|
| 2. HTML + CSS in 2024 is capable of reproducing virtually any
| kind of printed medium, but it can also reflow text.
|
| 3. I don't personally think JS should be a requirement, but a lot
| of conversations break down at this point so let do my best to
| explain and please understand I'm not simply being religious
| about this. In my view, an ebook is a book that can change its
| size in a way that makes sense. Bearing this in mind, I believe
| that if a reader has JS turned off, this book should work as
| intended. In other words JS shouldn't be required for it to
| perform its basic function. However, if there is some necessity
| to add interactivity or to augment the book, then yeah, why not,
| use JS. That's what it's there for. However, from the perspective
| of being a book, if it doesn't work as a book without JS then
| it's a bug in my view. (for standard ereader capabilities, I
| think the browser should offer that, but the book itself
| shouldn't be shipped with an ereader)
|
| 4. I think it's a mistake to embed all the styling as that may
| violate some people's CSP's. It's safer to specify separate
| styling as resources relative to the HTML, and ebooks should
| forbid loading resources from a separate domain. In this way,
| ebooks would always work offline, but it has the added benefit of
| working online too and would automatically adhere to the
| strictest possible CSP. This achieves the same goal as offline,
| but in a safer and more universally compliant way.
|
| 5. Finally, just distribute it in a zip file. That's how ebooks
| already work right?
| gildas wrote:
| > 5. Finally, just distribute it in a zip file. That's how
| ebooks already work right?
|
| What about self-extracting ZIP files like this page?
| https://gildas-lormeau.github.io/ (note that it includes the
| CSP to make it safe)
| asimpletune wrote:
| I don't think there's any CSP?
| https://observatory.mozilla.org/analyze/gildas-
| lormeau.githu...
|
| In any case, ideally, any ebook solution should have all
| resources loaded as files relative to the current document,
| and nothing inline. Like this, the book would be compatible
| both offline and to be hosted online, without requiring any
| changes to the book.
|
| The one file thing is cool, but again it requires JS to show
| anything, so that's not really inline with what I was talking
| about. In my view, an HTML document should work like a book,
| and any JS is purely to augment and extend that book if
| necessary. Those situations are rare though, and most JS is
| just to give a reader like experience.
|
| I think if someone is going to go through the trouble to host
| a book, they probably don't mind unziping a file before
| putting it on their server. The one file thing I said earlier
| was more about sharing the book, similar to an app package.
| but once it's on someone's servers presumably it's ready to
| be read.
|
| Basically .webarchive
| gildas wrote:
| The index page is protected via CSP. The "bootstrap" page
| which unzips the page isn't but it could. I just did not
| include it for some reasons I've forgotten.
|
| The JS is not essential, there's nothing to stop you
| treating the file as a ZIP file and unzipping it beforehand
| to view it.
| asimpletune wrote:
| Ok, well maybe the example provided doesn't communicate
| the intention, because when I try to 'extract' the page,
| following the download links, they don't work, except the
| png. Cool concept though.
| gildas wrote:
| Unfortunately, the file poses a problem for basic
| unzipping software (e.g. a file explorer). However, the
| file is 100% valid with respect to the ZIP specification.
| You just need to use "true" unzipping software like 7zip,
| unzip etc. to read it as such.
| emmanueloga_ wrote:
| Super interesting! Haven't seen this before. Very creative
| (abuse?) of zip files! :-)
|
| https://github.com/gildas-
| lormeau/SingleFile/blob/master/faq...
| Turing_Machine wrote:
| > That's how ebooks already work right?
|
| Sort of, in the sense that any unzip tool will unpack an EPUB
| file (possibly after renaming it to a .zip extension rather
| than .epub).
|
| However, it doesn't necessarily work the other way. You can't
| just zip the files at random and wind up with a valid EPUB,
| even if that exact same set of files was a valid EPUB before it
| was unzipped.
|
| The "mimetype" file in an EPUB zip is special. It has to be the
| very first entry in the archive. It also has to contain the
| string "application/epub+zip" (and nothing else) and must be
| stored _un_ compressed.
|
| A surprising number of zip tools make it hard to do this (e.g.,
| by altering the file order as more files get added to the zip,
| or making it hard or impossible to store one file uncompressed
| while compressing the others). Most of the command line zip
| programs can do this with the proper command line flags, but
| zip libraries often make it a PITA.
|
| Source: have written EPUB generation software.
| charlieyu1 wrote:
| Not a fan of zipping everything.
|
| One thing I missed about working with old .doc files instead of
| .docx, was that it was very fast to search a folder with
| hundreds or thousands of files for a specific word. Not
| possible for zip formats. (I just saved a file in .doc right
| now and open it with a hex editor and it does contain the
| contents in plain-text.) It is a problem to search zip formats
| or pdf files with grep, and I really doubt that the zipping is
| necessary for text with maybe a few thousand characters.
| Arelius wrote:
| To be fair. Iirc, an epub is stored uncompressed. So you
| should be able to get a directory of them no problem.
| kimixa wrote:
| Zip files don't _need_ to be compressed, if there 's a need
| maybe you can promote adding the option upstream to whatever
| apps you use (if it's not already there somewhere). Or a .zip
| equivalent for zgrep.
|
| Though pattern matching in files feels very fragile - simple
| text patterns would still rely on the text being kept in a
| contiguous chunk and no embedded data/markup within the
| section you're searching for...
|
| Though I generally see the prevalence of zip files a result
| of people assuming that collections of files/streams need to
| be packed into a single object for "user simplicity", but we
| already have this with OS support in the form of directories.
| It just seems people have embedded the assumption that tools
| and UI handle directories as single units poorly, when that
| doesn't really need to be the case.
| lupusreal wrote:
| Browsers don't really have good support for downloading a
| whole directory.
| genevra wrote:
| Have you ever tried to download 100 individual files or
| moved 10,000 files to a drive VS a singular zip? The
| difference is night and day
| lupusreal wrote:
| Just use zipgrep to search inside zip files? Or just unzip it
| then use regular grep. I don't think it makes sense to gimp a
| file format just to make life slightly easier for people
| using archaic Unix utilities.
| WorldMaker wrote:
| ZIP is a real simple format under the hood.
|
| There are zip-aware grep tools that you could investigate if
| they fit your workflows.
|
| There's also nothing stopping you from storing the contents
| of zips as unzipped directories and rezip when/as necessary.
| (I wrote a tool called musdex years ago to automate that flow
| in the context of source control: source control the contents
| of something like .docx expanded into a full directory
| structure, but preserve the ability to "double click the Word
| file to edit".)
|
| As a wrapper of "collections of deeply related files, some
| which may be text and some may be binary", ZIP is one of the
| better choices that we have (compare to TAR or MIME
| envelopes, for instance).
| chasil wrote:
| JAR and APK files are really ZIP, as is EPUB.
|
| True, ZIPs are hard to grep. However, gzip and progeny come
| bundled with convenience scripts as zgrep, bzgrep, xzgrep,
| etc.
|
| Maybe use a grep-friendly format if that's important to you,
| or otherwise a filesystem with transparent compression
| (btrfs, ZFS, or [gasp] NTFS).
| Shorel wrote:
| About #3: I would say that usability without JS should be a
| requirement. We don't need JS, especially in a low-power device
| like an e-reader.
|
| #5: Why change something that is already working well? The
| extension is .epub, even if it is a zip file.
| ianburrell wrote:
| It is also important to distinguish between normal zip file and
| ebooks. Let's use the extension ".epub".
|
| You described ePub format, where HTML, CSS, images files go in
| zip archive.
| Finnucane wrote:
| I see a couple of roadblocks for this. One is that is suggestion
| of a restrictive subset of HTML for coding seems like a potential
| accessibility problem, which is to say, you'd have to make your
| documents less semantically rich. For instance, he seems to be
| suggesting using. It's already hard enough to get epubs to work
| right when reading systems lag behind what browsers can support,
| saying 'let's have less' is not going to make things easier if
| you have complex content. The problem is not that there is too
| much html or css, the problem is that reading systems don't
| support them properly.
|
| Also, most dedicated reading systems (Kindle, Kobo, etc) don't
| allow javascript, which means your components will not work. That
| might of course change, but I wouldn't hold my breath for it.
| velcrovan wrote:
| > One is that is suggestion of a restrictive subset of HTML for
| coding seems like a potential accessibility problem, which is
| to say, you'd have to make your documents less semantically
| rich.
|
| "less semantically rich" than what? Web pages? Or less rich
| than PDFs, which is what he's actually proposing to replace?
| Finnucane wrote:
| Than the epub standard as it exists. I guess I don't
| understand what the advantage here is, really. If you want to
| make an epub file that works universally, you can do that,
| within the existing standard. If makers of reading systems
| and software would actually support the full standard, which
| they currrently don't. If you don't want to use pdfs, don't
| use pdfs. pdfs get used for a lot of things they aren't
| actually very good for.
| velcrovan wrote:
| > If you want to make an epub file that works universally,
| you can do that, within the existing standard.
|
| Yes, that's what the author is proposing we do.
| Finnucane wrote:
| So basically, nothing new.
| ryukafalz wrote:
| I did a double-take when I reached this part:
|
| > Therefore I decided to build a lighter EPUB reading system,
| Bene. You're using it right now. This document is an EPUB -- you
| can download it by clicking the button in the top-right corner.
|
| Because, reading this on a desktop browser, I didn't even notice
| until it was pointed out. It's more obvious on mobile because the
| header takes up more of the viewport, but it otherwise behaves
| pretty much like a normal web page.
|
| This is probably a good thing.
|
| For what it's worth, I didn't see (or at least didn't notice) a
| spinner when loading the doc for the first time like some other
| people in the comments reported. I did notice it on my phone, but
| it went by pretty quickly. I'm not sure if that's the WASM
| program loading and if it only happens the first time you load
| the page.
| edflsafoiewq wrote:
| The main thing the spinner waits on is the .epub file to be
| downloaded. That file is 4.77 MB, which is appreciable for
| anyone without a fast connection. Most of that weight is images
| (99% after decompression). Unlike a normal webpage (or a PDF),
| rendering doesn't appear to start until all the assets, the
| whole ePub, has been downloaded.
|
| This segues into a point of difference I thought the article
| would mention, but didn't: performance.
|
| A PDF can be optimized so the pages are substantially
| independent of each other, which makes rendering pages
| progressive, random-access, and highly parallelizable.
| aabdulllah wrote:
| Hi, I was wondering if someone could explain what software
| solution I need to create and implement in my calculator for it
| to have a fast response time when calculating. How does this
| work, in other words what enables it to work so fast?
| aabdulllah wrote:
| p.s. sorry I am new to engineering.
| SamBam wrote:
| One thing I'm very interested in, as a grad student who has to
| consume a huge number of PDFs, is whether there are good tools
| for converting existing PDFs to portable EPUBs or HTML documents.
|
| If I use, for instance, CloudConvert [1], I generally get a
| document that gets flowing text roughly right, but still
| interrupts the text with page numbers and book titles (that were
| originally at the top of each page) and includes additional
| bizarre line breaks, etc.
|
| Every so often I wonder if this is an LLM problem ("please
| reformat the following text to...") but I think that one
| shouldn't reach for an LLM for these kinds of things.
|
| 1. https://cloudconvert.com/pdf-to-epub
| nmz wrote:
| > * PDFs cannot easily express interaction. PDFs were primarily
| designed as static documents that cannot react to user input
| beyond filling in forms.
|
| I am glad about this, I do not want to download a document and it
| require any input. a document should be a document, nothing more.
| If I'm getting a book to read (pdf), I expect a book, not a
| webapp.
| sotix wrote:
| I really like this proposal. Just last night, I was reading an
| EPUB of The Hobbit and clicked on a footnote, which instructed me
| to refer to page 24. It turned out that it meant page 24 of the
| printed edition of the book, which was in the first chapter of
| the book. Page 24 in the EPUB was still part of the prologue. So
| I had no idea which page it was referencing. As it stands, my
| Kobo has an increased font size, so I notice that I can flip the
| page a few times and still be on page 4 before it finally turns
| to page 5, which I assume is referencing the pages of the written
| text. This is a nice compromise, but doesn't solve the issue with
| the hard coded footnote being misleading.
|
| I wonder if we could instead look at religious texts such as the
| Bible (e.g. John 3:16) and code editors (e.g. Ln 4, Col 12) for
| referencing locations in reflowable text. The same way you can
| jump to a footnote in a document should allow you to have an
| actionable reference to a specific location anywhere in the text.
| But I don't think the text should be stylized like how the Bible
| has the numbers (e.g. 16) scattered within the text itself. Those
| should probably be hidden within the text and leave the reading
| software to display the first line number of the page down at the
| bottom instead of the page number. That might look like "4" the
| same as it currently does, but this 4 references a section of the
| text rather than a page number. Perhaps it could be togglable for
| greater detail and display word 23 of section 4 as "4:23". Or
| maybe it could consider the chapter too. For example, chapter 2
| section 4, word 23 would look like "2, 4:23". This might get
| funky in a Terry Pratchett novel, but it would hopefully allow
| for easier discussion of exact parts in a document and
| significantly easier linking.
|
| I _love_ the interactive code example for marking up The Rust
| Programming Language. That gets at what I was saying above
| although is more targeted at a document than referencing parts of
| a novel.
|
| Kudos to author for creating Bene[0] as part of this proposal.
| That was cool to discover I was using their tool to read the
| proposal itself!
|
| [0] https://github.com/nota-lang/bene/
| Shorel wrote:
| On the one hand, these are just EPUBs, nothing in the article
| makes the generated EPUBs different or incompatible from the ones
| I usually read.
|
| On the other hand: this looks awesome for the Web. Mix it with a
| blog platform like Medium, and it will improve my Web browsing
| experience tenfold.
___________________________________________________________________
(page generated 2024-01-26 23:01 UTC)