[HN Gopher] Deurbanising the Web [pdf]
___________________________________________________________________
Deurbanising the Web [pdf]
Author : ColinWright
Score : 453 points
Date : 2021-07-19 10:22 UTC (12 hours ago)
(HTM) web link (lab6.com)
(TXT) w3m dump (lab6.com)
| everyone wrote:
| "with no external dependencies to manage."
|
| Except for like, the software which reads and renders the pdf,
| which may not be available on current or future OSes.
|
| I dont see how this is any different to writing a static webpage.
| temporallobe wrote:
| Why not just extremely simple, plain HTML? No frameworks, not
| even CSS. In fact, you could make your life even simpler by using
| markdown files and having the browser convert that to HTML in
| real time with a single JS library (there are a few, I am not
| promoting anything one particular), so it doesn't even require a
| "back end"! Plain HTML, while not having all the "portable"
| attributes of PDF, is still pretty darn robust and most browsers
| handle printing (or conversion to PDF) quite well.
| prox wrote:
| I think it is because PDF is a document first, and HTML often
| hard to save/file.
|
| PDF is also able to create with design in mind, in a document
| creation app, which after decades of HTML is still hard to do I
| think.
| BenjiWiebe wrote:
| HTML isn't hard to save and file on a computer, and on phones
| it seems everything is hard to save and file.
| prox wrote:
| You are right in a technical sense, but if I ask someone
| who is a low level user to save a webpage, most don't know
| how to do that.
|
| It's not front and center or even encouraged! This makes a
| big difference for adaption.
| dredmorbius wrote:
| Some of the listed benefits don't apply. Notably paginated
| (PDF) vs. scrolled navigation, but also features such as
| formulae displays and specific typesetting / layout elements,
| in-page bookmarks, highlighting, and notes.
|
| For shorter documents that's not much of a problem. For
| anything much over ~chapter length (about 20 pages or 10,000
| words), navigation within a single HTML page becomes painful.
| Well below that level on smaller devices
| aenigma wrote:
| Great article - so much depth and accuracy to this! I see a lot
| of discussion about the semantics of pdfs but I think those are
| missing the overarching theme here.
|
| Feels like this is more about the fact that websites have become
| increasingly dynamic, unstable, unreliable, inconsistent, etc. -
| pdfs offer something like a book, static, stable, reliable and
| consistent.
|
| Think about a book you can turn to a specific page no matter how
| many times you look at it and the print is the same, the
| information is the same, you can do the same action over and over
| again and get the same expected result.
|
| Now imagine opening a book and you could have sworn that the
| chapter you wanted to reference was 11 but now it's 16 and the
| images are different, the examples are different, in fact the
| quote that you wanted to use for reference no longer exists in
| the book.
|
| There's an insanity to this experience but it's exactly what the
| web is like - a book that is constantly changing, upended changed
| - even disappearing entirely. I could have sworn I had bought
| that book on discrete mathematics - how could it be gone? oh
| that's right the server managing site is powered off - book no
| longer even exists.
| stayux wrote:
| Thanks. I am starting self-hosted blog about design fundamentals,
| best-practices, etc. Using only PDF is not a solution for me.
| Combining minimalistic web-site design with pdf/e-pub will suit
| me well. I like your approach as a statement against web
| "pollution".
| bmn__ wrote:
| It is too early to displace HTML with PDF.
|
| > PDFs used to be inaccessible
|
| My eyes are not very good. I have trouble reading the font in the
| PDF. I am using Firefox. HTML lets me pick that a font that I can
| read easily. I cannot do that with PDF.
|
| > PDFs used to be unreadable on small screens, but now you can
| reflow them.
|
| I am using Firefox. I cannot do that.
|
| Realistically, how many years will I have to wait until Firefox
| catches up?
|
| Over twenty years ago, I learnt Web authoring by examining the
| source which had a profound effect on my career. That
| serendipitous opportunity I had with human-readable sources will
| be lost to the next generation with PDF - they have to learn the
| technology deliberately.
| titzer wrote:
| > Over twenty years ago, I learnt Web authoring by examining
| the source
|
| So did I. Now, it is impossible to reverse engineer the metric
| crapton of minified JS and CSS cryptoglyphics that comprise the
| modern web.
| rollcat wrote:
| TBH it's a little bit like complaining you can't open a
| modern binary executable in a hex editor and learn
| programming from that. Days of doing your regular coding by
| writing direct machine code or assembly are (mostly) gone,
| and for the sake of advancing the craft, I'm (mostly) happy
| with it.
|
| But I too wish the modern web was simpler. It took an
| evolutionary path of maintaining just enough backwards
| compatibility to only keep making things worse. Efforts like
| Gemini[1] bring some hope but I'm afraid the medium won't be
| flexible enough for much beyond personal blogs. But maybe
| that's for the better.
|
| [1]: https://gemini.circumlunar.space;
| gemini://gemini.circumlunar.space
| Santosh83 wrote:
| As far as I know, it is nothing specific to Firefox. You can't
| set your own PDF font or reflow a non-reflowable PDF in _any_
| browser.
| chrismorgan wrote:
| Brief investigation suggests reflow is a super-clumsy, ultra-
| coarse-grained view mode that is implemented by few clients,
| is not easy to access, is not well known, and is _vastly_
| inferior to what you can get on the web, especially as it's
| basically text-only.
|
| In Adobe Acrobat (and I'm _guessing_ Adobe Reader): Choose
| View - Zoom - Reflow, and it turns everything into one column
| of nigh-unformatted text.
|
| (Word looks like it _may_ support it, but that could be more
| that it's converted it to a Word document in some way and
| reflow-like functionality falls out of that naturally, though
| I imagine the tagging would help with the conversion; and
| someone in this thread mentions something called "Book
| Reader" supporting it.)
| silon42 wrote:
| >It is too early to displace HTML with PDF. 'Never' will be too
| early.
|
| >Realistically, how many years will I have to wait until
| Firefox catches up?
|
| They should better improve reflow for HTML on small devices
| first. Focusing on PDF is a waste of resources.
| zinekeller wrote:
| I mean, Firefox just follows the website's command to not
| format it as a mobile webpage, right? But a button to
| forcibly reflow is handy though.
| x86_64Ubuntu wrote:
| Source code for websites hasn't been readable for years.
| Reading a minimized JS document that has mauled the DOM is only
| slightly more readable than the structure of a PDF.
| simias wrote:
| My understanding is that PDF is a monster of a document format,
| and it's clearly not (usually and historically) meant to be
| reflowed. Even copy/pasting from PDFs can be very disconcerting
| because the viewer may not have a good idea of where blocks of
| text start and end (or even what the characters really are).
|
| I can empathize with the feeling that the web is incredibly
| bloated, but that's IMO throwing the baby with the bath water.
| Simple HTML with some optional CSS would do the job much better
| IMO (and can be easily downloaded, mirrored or offlined with
| tools like wget).
|
| And if you really don't like writing HTML (I won't blame you)
| then there's always formats like markdown, org-mode and friends
| which can easily be converted to pretty much anything.
| shuntress wrote:
| Dealing with PDFs (as in, coding a system that can
| import/export/display them) is more obnoxious than dealing
| with excel spreadsheets.
|
| Unless your system is a PDF library (as in, you make the
| black-box dependency that other systems use to handle PDF
| exports), everything you do with PDFs will be through some
| annoying black-box dependency that is a pain to use.
|
| Even relatively complex HTML is much more fun to work with
| than PDF.
| marcosdumay wrote:
| The one piece of software that I know that lets you reflow PDFs
| is Calibre. And the results aren't great.
| qznc wrote:
| At least it looks more beautiful than terminal-only Gemini
| sites.
|
| https://en.m.wikipedia.org/wiki/Gemini_%28protocol%29
| II2II wrote:
| Gemini sites are not terminal-only and the renderer can make
| it look beautiful (depending upon one's definition of
| beautiful). One example is Lagrange:
|
| https://github.com/skyjake/lagrange
| majewsky wrote:
| Gemini is as "terminal-only" as Markdown. Just because it's a
| text format first and foremost, does not mean that you can't
| display it nicely formatted. It's more like EPUB in that
| regard.
| tonis2 wrote:
| What a nice website, what framework is it built with ? Maybe
| Vue.js or Angular.js or maybe Nuxt fuking js ?
| leephillips wrote:
| A related idea is making a website entirely from SVG. Here is a
| lovely example: https://ozake.com/
| sammalloy wrote:
| One problem I noticed on mobile, is that if I click on a link in
| the PDF and visit another page, and then try to traverse back, it
| takes me to the first page in the PDF, rather than the page I
| linked from.
| marbu wrote:
| I don't consider using pdf for this purpose a good idea. It would
| be better to have a static html pages, with reference to epub
| with the same content. One can have both generated from the same
| source with a static site generator.
| zabzonk wrote:
| Sorry, I'm not a Web developer - what is meant by "churn and
| noise" in this context?
| westcort wrote:
| While I agree with the thesis, I believe it it possible to do
| things like this with vanilla HTML. For example, I created a
| search engine that is just a static HTML page:
| www.locserendipity.com
| jedimastert wrote:
| I find this to be a super interesting response. When I settled
| into my current website design, I ended up basically writing an
| article for the homepage. I'm not a designer by any stretch, and
| it was the most attractive homepage I could make, and I still
| really like it. I used a very similar workflow (and continue to
| for articles) to the papers I wrote in college, and would really
| only take one more step to get that to final pdf state.
|
| I'm torn between leaning into the static nature of the site and
| implementing the wiki I've been thinking about making
| xvector wrote:
| I think simple HTML + print to PDF (supported by default in most
| browsers) is a much more elegant solution.
| opsecweather wrote:
| Run it through outline.com first to remove all the ad-sidebars.
| cochne wrote:
| As someone who works with PDFs a lot, please don't. PDFs are
| awful in every case except those which require a very precise
| visual layout. From reading the article, I do not see a single
| case in which PDF is superior to vanilla HTML.
| blacktriangle wrote:
| "HTML's semantic capabilities were oversold."
|
| THANK YOU! HTML semantics are a trap, just enough to make you
| think something is there but anemic enough to be a giant
| excersize in bikesheading. Ask yourself this: If HTML semantics
| were adequate, why do we have ARIA and 90 different microformats?
|
| Other than that, I read the article expecting to be annoyed by
| the PDF presentation but was pleasantly surprised by how it read
| just like I would want a content page to read. My only complaint
| is that browsers (at least Brave) do not preserve scroll position
| in PDFs. If the browsers fix that the author may be onto
| something here.
| X6S1x6Okd1st wrote:
| "I'm mad as hell and I'm not gonna take it any more" but for
| webtech.
|
| It's totally unclear why they don't just use a subset
| sbazerque wrote:
| I like the idea of keeping HTML's document-centric original
| design, but accessing the documents using p2p protocols (instead
| of the client-server model used on the web).
|
| I'm working on an open-source implementation of this idea at
| https://www.hyperhyperspace.org
| Santosh83 wrote:
| Why not just publish static HTML with CSS only? It is, to my
| mind, better and more accessible than either PDF or a Javascript
| SPA.
| TheCoelacanth wrote:
| And if you bundle that HTML and CSS as an EPUB, it's just as
| self-contained as a PDF.
| rado wrote:
| This is terrible for accessibility. Please just use semantic HTML
| and your web will be usable on 10yo devices and unknown devices
| 10 years in the future.
| dvfjsdhgfv wrote:
| I don't agree with author's choices (yes, I'm disciplined enough
| not to add irrelevant elements to my content), but it's really
| sad that things got to the point where someone actually suggests
| PDF as an alternative to the web.
| KEITH_PETERSON wrote:
| I just opened your website on mobile and it's very user friendly,
| I got to scroll in many directions to read the content.
|
| We build our own website with gatsby and only use js if it's
| really needed (when you click interactive links, we're still
| trying to improve a bit. We customized Gatsby because doesn't
| support this out of the box) that gets 100 score on mobile on
| Google page speed: https://marxcommunications.com/
|
| Proof: https://imgur.com/a/N4IJoEk
|
| Or run it yourself:
| https://developers.google.com/speed/pagespeed/insights/?url=...
|
| It's possible but takes some work.
| saltdoo wrote:
| I'm on mobile and unable to open any links in this pdf after
| opening with three different pdf viewing apps. :/
| failwhaleshark wrote:
| _Cut off your nose to spite your audience._
|
| PDF is meant for viewing and printing books. It's not very good
| for browsing and requires PDF viewers. All of the browser add-
| ons, functionality, and behaviors are lost by forcing people to
| use a PDF viewer.
|
| HTML is meant primarily for browsing but it can also be used for
| print media. CSS can specify paper sizes. If someone were so
| worried about external media, they can host it themselves or roll
| their own CDN. If they were so worried about fonts, they can
| include them themselves.
|
| It's more semantic web-compatible to describe a website with RDF
| and have PDF, EPUB, DJVU, MOBI, TXT, PS, etc. links there and
| also in the webpage. This is how you provide the most
| accessibility. Furthermore, using a meta document language like
| LaTeX or something XML that can transform into other document
| artifact forms mechanically is the way to go.
| maccard wrote:
| I've always wondered why some sites can serve PDFs that my
| browser (firefox) can view inline (my preferred method), rather
| than forcing me to download the file and open in a separate
| application
| [deleted]
| chrismorgan wrote:
| It depends on the Content-Disposition header:
| https://developer.mozilla.org/en-
| US/docs/Web/HTTP/Headers/Co....
|
| There are extensions that let you intercept this header, e.g.
| https://addons.mozilla.org/en-GB/firefox/addon/no-pdf-downlo...
| which per https://github.com/MorbZ/no-pdf-
| download/blob/c924d657f33398... detects the content-type and if
| it's PDFy replaces the content-disposition header with
| "inline".
|
| (Clicking on a link that has the download attribute set also
| affects things: https://developer.mozilla.org/en-
| US/docs/Web/API/HTMLAnchorE....)
| noduerme wrote:
| I read this entire document. If you've ever had to write a PDF-
| to-text parser - and God help you, I have - you will beg for
| Flash to come back as a web standard.
|
| [edit] Generally though, I'm sympathetic with your point and it's
| kind of like why zines regained popularity in the 90s (and
| samizdat in the Soviet Union before that)... controlling your own
| publishing is a powerful idea. Anyone can do that though, without
| resorting to obscure formats, unless obfuscation is the point.
| taftster wrote:
| $> cat file.pdf | strings
|
| Done. /s
| boramalper wrote:
| Stop cat abuse! /s $> strings file.pdf
| shortformblog wrote:
| Even though Jakob Nielsen is very much still alive, he's rolling
| in his grave.
| monkeynotes wrote:
| * PDFs are self-contained and offlineable
|
| HTML can easily be offline-able. Base64 your images or use SVG,
| put your CSS in the HTML page, remove all 2-way data interaction,
| basically reduce HTML to the same performance as PDF and allow it
| to be downloaded.
|
| * PDFs are files
|
| HTML is files
|
| * PDFs are decentralised
|
| This should be "PDFs can be decentralised". PDFs aren't
| inherently any more decentralised than any other kind of file,
| including HTML.
|
| The store is the thing that becomes decentralised, not the
| content.
|
| * PDFs are page-oriented
|
| HTML can be page-oriented. Simply build your website with
| pagination. PDFs can also be abused to have hugely long pages.
| Bad UX can be encapsulated in any medium.
|
| * PDFs used to be large (bla bla bla Javascript weighs a lot)
|
| Nope, PDFs are still objectively larger than the equivalent HTML.
| PDFs don't have any dynamic interaction, rip all that out and
| produce the HTML of yesteryear and your HTML will be tiny in
| comparison to the PDF.
|
| Edit: I'm sorry, the more I think about this the dumber I feel.
| The web is useful because it's 2-way. I am excited by the web
| because I can interact with other people. I come to hacker news
| to engage with thinkers, not to just read a published article
| from one single author. I want to read ad-hoc opinions and user
| submitted content. PDF web, really?
| Tomte wrote:
| > Base64 your images [...], put your CSS in the HTML page
|
| Is there a tool that does those two things (or at least the
| first one) and that can be used by non-programmers (command
| line use is fine, a Python library would not be)?
| gildas wrote:
| You can use SingleFile for this, see
| https://github.com/gildas-lormeau/SingleFile/
| Frost1x wrote:
| >PDFs don't have any dynamic interaction...
|
| Just a caveat to that statement, you can literally do
| interactive and dynamic 3D graphics rendering in PDFs:
| https://helpx.adobe.com/acrobat/using/enable-3d-content-pdf....
|
| You can also embed JS in PDFs:
| https://helpx.adobe.com/acrobat/using/applying-actions-scrip...
| dathinab wrote:
| Yes, and many of this things are "in general" not well
| supported by anything but adobe PDF.
|
| Even most simple interactive things can easily not work
| correctly even in more widely spread PDF readers.
|
| IMHO PDF is in many ways worse then HTML, it's just that this
| ways are less commonly used, but if you start a PDF instead
| of HTML trend it's just a matter of time until this "not so
| compatible" aspects of PDF become widely used by some people.
| monkeynotes wrote:
| JS in a PDF? You can do that in HTML, why not use the tools
| you already have that work together by design?
|
| This guy is arguing that removing JS is what makes the web
| better. Having published, static, paper-like content is the
| way forward.
| Frost1x wrote:
| Just caveating a technical statement I knew wasn't quite
| true, not making any sort of assessment either way.
|
| As someone who has had to extract data from large sets of
| PDFs and modern web presentation formats, I'm not a fan of
| either, really. Even verifying that a visibly presented
| string exists in a PDF document programmatically can be a
| non-trivial task, as with a given website as well. That to
| me says a lot.
| chalst wrote:
| monkeynotes seems to take the line that technical defects
| in claims others make fatally undermines their case, but
| technical defects in his/her arguments are irrelevancies.
|
| For what it's worth, the same objection occured to me.
| The use of scripting I've seen in PDFs has been use-
| supporting and consistent with their book-like feel.
| anigbrowl wrote:
| _HTML can easily be offline-able._
|
| Sure - if the publisher cares. From the user's standpoint, the
| safe assumption is that they don't. Of course PDF is No Good
| for many contexts, but for any sort of long-form document that
| is primarily meant to be read, it's so often better.
|
| Also, if something is available in pdf, I can be moderately
| sure that someone else took the time to make sure it would be
| formatted correctly and print out OK.* If it only exists in
| HTML it's more of a roulette wheel experience.
|
| * Unless some graphic designer thought 'gee this report would
| look so cool if the cover pages were black or some other highly
| saturated block of solid color.'
| kemitche wrote:
| PDFs are also horrible to view on mobile, as the text doesn't
| reflow.
| majkinetor wrote:
| PDF
|
| - does not reflow, major suck
|
| - is binary format, another major suck
|
| So no thx, PDF is outdated tech, while HTML and friends are
| just abused.
| anigbrowl wrote:
| What I like best about pdf files is that I can just give them
| to someone and be almost certain that any questions will be
| about the content rather than the format of the file.
| LeifCarrotson wrote:
| When you find a page - inherently a document-oriented term -
| like an article, blog post, how-to, or project writeup that's
| interesting or useful, and you want to make sure it's available
| to you later, what do you do?
|
| Do you save the HTML, CSS, and Javascript, and hope that it
| works offline? I used to use the "Save page as..." tool back in
| the early 2000s, but it's become less and less useful, with too
| many dysfunctional disappointments.
|
| No, I cut out some junk I don't need with the Printliminator
| [1] bookmarklet, then I do a *print-to-PDF.* This gives me a
| file. I can save the file, back it up to my NAS, search for it
| later, keep it with other files from a project where it was
| useful, and otherwise hang onto it. This is so common, in fact,
| that it's gone from being an obscure thing you could do with a
| Postscript-to-PDF converter or (before the adware/Ask toolbar
| scandal) the installing the CutePDF virtual printer. Modern
| OSes bundle a PDF printer, and print dialogs understand that
| you want to "Save as PDF". Google Docs and Office 365 editors
| allow downloading a document as a PDF.
|
| I totally agree that a dynamic, interactive page or a comment
| section is not compatible with this model of usage. There's a
| lot of consumption of endless feeds, and a lot of one-time
| video views that also don't make sense to save as offline
| files. However, the web for creators, where people write
| articles that are worth hanging onto, has a definite place for
| PDFs.
|
| [1]: http://css-tricks.github.io/The-Printliminator/
| apotheon wrote:
| I actually dislike HTML per se, but the only two benefits I
| see for PDFs in the general case are:
|
| - In my experience, it's a little harder and rarer to make
| PDFs utterly incompatible with different means of viewing
| them, and it generally requires more overt (if perhaps
| slightly unintentional, at times) sadism to make that happen.
|
| - PDFs can do some things HTML can't (easily, at least) with
| document design -- though those things are generally things
| that would be disallowed in our new "deurbanized" PDF-based
| web replacement.
|
| Everything else that comes to mind goes the other way,
| including the fact that the viewing-mechanism incompatibility
| thing can be even worse with PDFs, even if it's more rare for
| that to happen at present, and if PDFs became the new
| standard for the web I'm pretty sure that relative rarity
| would evaporate anyway. Let's also not forget that HTML can
| also do some things PDFs can't (as easily, at least) do.
| jhgb wrote:
| > Do you save the HTML, CSS, and Javascript, and hope that it
| works offline? I used to use the "Save page as..." tool back
| in the early 2000s, but it's become less and less useful,
| with too many dysfunctional disappointments.
|
| I'm too lazy, so I just tend to use SingleFile these days...
| derefr wrote:
| > When you find a page [...] and you want to make sure it's
| available to you later, what do you do?
|
| Instead of doing a bad and lossy job of archiving the page
| myself, I notify+ our friendly neighbourhood archivists at
| the Internet Archive of the page; and _they_ then do the
| best, most lossless job of preserving the page that they 're
| able, given their cumulative experience.
|
| + http://blog.archive.org/2017/01/25/see-something-save-
| someth...
|
| As a side-benefit, they also then take care of keeping the
| archive they've made around and available online in
| perpetuity, with no additional marginal effort on my part.
| The same can't be said for something in my own "private
| collection."
| htek wrote:
| That's subobtimal as well. The site could come out with a
| new robots.txt file which is just <code>User-agent: *
| Disallow: /</code> and everything already indexed by the
| Internet Archive is now inaccessible to you.
| tenebrisalietum wrote:
| > in perpetuity
|
| Hopefully it really is around a very long time, but the
| world is unpredictable and things change. It's great to
| enhance the Internet Archive, but you can bet I'm keeping
| my local copy too. Just in case.
| [deleted]
| turtlebits wrote:
| Do you never get online receipts that you need to keep a
| copy of?
| derefr wrote:
| I don't think I've ever had such a thing that only
| appeared as a web page, without being emailed to me. To
| me, the email is the primary-source document in that
| arrangement.
| Santosh83 wrote:
| There is value in having a personally curated, offline
| collection of documents. You can search, annotate or
| otherwise manipulate it to your heart's content, all
| without having to be connected.
|
| Of course the Internet Archive serves other purposes for
| which it is (currently) irreplaceable.
| admax88q wrote:
| There's also opportunity cost in spending time
| maintaining, indexing, annotating your own archive of
| documents.
| cxr wrote:
| Zotero is much better for this than the too-fiddly print-
| to-PDF workflow described in the earlier comment.
| daggersandscars wrote:
| This may not be well-known, but archive.org can and does
| remove pages / sites from the archive. Authors can request
| this, site owners (separate from the authors) can request
| this. There may be others who can request this.
|
| Just an FYI. If there are critical sites you want copies
| of, I'd recommend making your own copy. I've lost access to
| important pages / sites twice before taking this to heart.
|
| Edited for clarity
| [deleted]
| blooalien wrote:
| Also useful: https://pypi.org/project/html2text/
| gregsadetsky wrote:
| There was an interesting discussion about this a year ago:
|
| https://news.ycombinator.com/item?id=23228098
|
| ----
|
| This is still not as powerful as my one, simple trick to
| handle all bookmarks, ever: Print to PDF. I've been doing it
| since last century, and I have 10's of thousands of PDF's of
| every single web page I've ever found interesting, sitting
| right there in a directory on my computer
|
| ----
|
| Including the suggestion that was brought up to use ripgrep
| to search in the pdf text content.
| anigbrowl wrote:
| Sometimes if I'm researching a topic I'll dig up a big
| number of newspaper articles and want to print them and
| read them away from the screen while scribbling notes etc,
| but on a lot of websites banner ads or footers with
| copyright statements can really mess it up.
| supperburg wrote:
| This reminds me of the guy who said drop box was stupid because
| he could set up an ftp server. It's the exact same argument.
|
| People understand PDFs, they are extremely common in the
| academic and business world as "digital paper" standalone
| documents. Hypothetically, anything in memory can be made into
| a file but in this scenario what matters is the practical goal
| of people actually using these files.
|
| I think it makes sense for the web to be made up of discreet
| primitives not only so that the web can be browsed in an
| intuitive and frictionless way but also because it lends itself
| to being backed up and easily re-hosted.
| chowderman wrote:
| > HTML can easily be offline-able. Base64 your images or use
| SVG, put your CSS in the HTML page, remove all 2-way data
| interaction, basically reduce HTML to the same performance as
| PDF and allow it to be downloaded.
|
| I built a tool for this exact purpose[0] since the HTML
| specification and modern browsers have a lot of nice features
| for creating and reading documents compared to PDF (reflow and
| responsive page scaling, accessibility, easily sharable, a lot
| of styling options that are easy to use, ability for the user
| to easily modify the document or change the style, integration
| with existing web technologies, etc.). In general I would
| rather read an HTML document than the PDF document since I like
| to modify the styling in various ways (dark theme extensions in
| the browser for example) which may be hard to do with a PDF,
| but its more of a personal preference. Some people will prefer
| that the document adjusts to the screen size of the device
| (many HTML pages), and others will prefer the exact same or
| similar rendering regardless of the screen size (PDF).
|
| Either way, kind of a fun idea making a website using just
| PDFs. Not the most practical choice, but fun none-the-less.
|
| [0] https://github.com/chowderman/hyperfiler
| pajko wrote:
| This. Also who hates the huge double margins? The slow
| rendering? The unnatural break-up of text? Meaningless headers
| and footers? And the whole page-based layout? PDF is not meant
| for the web. Period.
| stjohnswarts wrote:
| so because someone chooses to publish their website in an open
| format that they prefer "it's dumb" because they don't agree
| with you.
| baybal2 wrote:
| HTML used to be a very nice format at the age of xhtml 1.1,
| very formally specified, and a tie with DOM was assured by vert
| strictly standardised DOM v3. And ACID3 was giving you a pixel
| for pixel repeatability during rendering.
|
| HTML+JS today... now it's effectively a standard in name only,
| and Chrome is the new IE6. The standard is now "what has worked
| in the last stable release"
|
| Now go to http://acid3.acidtests.org/ and see how the latest
| stable Chrome release can't render a decade old CSS testcase.
| playpause wrote:
| These all seem like technical quibbles that miss the point.
| jedimastert wrote:
| This statement could be for both the comment you're replying
| to and the original article.
| quietbritishjim wrote:
| > These all seem like technical quibbles that miss the point.
|
| If these all "miss the point", what _is_ the point?
|
| It seems to me that the article's point is that PDF as a
| format has attributes that satisfy the author's goal, whereas
| HTML does not. The parent comment says that HTML does have
| those attributes after all (if you choose to use HTML that
| way). That is very directly addressing the article's point,
| as I understand it.
| JohnFen wrote:
| Perhaps I misunderstood, but I believe the author's point
| was to highlight what a steaming mess the modern web is.
| The PDF aspect strikes me as illustrating a point, not a
| seriously proposed solution.
| wlesieutre wrote:
| Unless I'm on a paper-sized tablet I would definitely rather
| have an offline HTML file than a PDF. Nobody likes to pan
| back and forth on lines of text to read something.
| pseingatl wrote:
| PDF is size-agnostic. There's nothing to stop you from
| creating documents the size of a phone screen.
| wlesieutre wrote:
| I'm commenting here as a user reading a PDF. The fact
| that someone else could have laid it out differently
| doesn't change the fixed layout of the PDF that I'm
| trying to read.
|
| There's a reason responsive design has been a big deal
| for the last 10+ years and I don't think the benefits of
| PDF are worth throwing it out.
| JohnFen wrote:
| As someone who really detests responsive design, the lack
| of it in a PDF strikes me as a feature, not a bug.
| Robotbeat wrote:
| I had the exact opposite reaction. I'm reading this on an
| iPhone SE2020, and I MUCH appreciate reading this in pdf
| form. I didn't have to pan back and forth or even put the
| phone in landscape orientation. This is one of the smallest
| smartphones you can still buy, and the experience of PDF is
| WAY better than the user-hostile auto-flow text forced down
| mobile users' throats.
|
| I was skeptical at first, but I think the author made the
| point fantastically well.
| wlesieutre wrote:
| To get equally small text on my desktop I have to turn
| the font size all the way down to 7. God forbid you have
| readers with less than stellar eyesight.
|
| I get what they're going for but the PDF is not exactly
| an accessible reading experience.
| nemetroid wrote:
| I'm using a 2016 iPhone SE, and it's largely unreadable
| without being very up close.
| cunthorpe wrote:
| What.
|
| Your browser has a zoom functionality that lets you make
| the text smaller, essentially replicating the PDF site
| above. Only the opposite of what you say is correct: I
| can't read that PDF's text without turning my phone into
| landscape and picking up my glasses.
| apotheon wrote:
| EPUB would beat the shit out of PDF for that.
|
| (EPUB is basically a subset of HTML with client-oriented
| context.)
| monkeynotes wrote:
| The guy outlines his whole case based on those exact points
| which are, as you have observed, technical quibbles and not a
| basis for abandoning HTML.
|
| Under the hood it seems apparent to me that the real premise
| is an emotional one, not a technical one.
|
| The internet is plastic not because of HTML, but because of
| money and people. When you have teens driving content it's
| going to feel plastic. When Walmart uses the internet to sell
| you crap it's gonna be plastic. Gossip / social platforms are
| trash, no matter the medium.
|
| It could be argued that TV is an incredible learning platform
| ruined by HD. Back in the standard definition days we had
| proper news, documentaries that were substantial, and no
| reality TV. We need to go back to black and white standard
| definition.
|
| Sorry, but the PDF web is not a solution to societal rot.
| tablespoon wrote:
| > The guy outlines his whole case based on those exact
| points which are, as you have observed, technical quibbles
| and not a basis for abandoning HTML.
|
| He's actually more of a _social_ observation: it doesn 't
| matter what the technology _can_ do, what matters how how
| the developers of that technology _actually_ use it.
|
| People who use PDF almost _never_ use 3D graphics and heavy
| dynamic JS, so PDFs almost always have many of the
| qualities he 's seeking.
|
| Web developers almost _never_ inline anything, and do all
| kinds of things that are arguably deal-breakers except for
| a few lowest-common-denominator use cases.
|
| > Under the hood it seems apparent to me that the real
| premise is an emotional one, not a technical one.
|
| The premise is that the web has failed in important and
| clear ways, it's impossible to fix so we should give up, so
| many use cases should abandon it for something else, and
| PDFs are unexpectedly well suited for that.
|
| On a related note, part of me wishes Java Applets never
| died. Getting rid of them seems to have caused the Web to
| turn into them, and maybe if they'd remained some kind of
| separation could have been maintained.
| apotheon wrote:
| Turning PDFs into the replacement for HTML would change
| the incentives around PDF authoring, and PDFs would then
| acquire the same problems identified with HTML.
|
| The solution to the identified problems is not to switch
| to PDFs. Stop reshuffling the chairs on the deck of your
| sinking ship, and start figuring out how to design,
| implement, and incentivize the use of, some means of
| conveyance other than iceberg-vulnerable ships.
|
| > On a related note, part of me wishes Java Applets never
| died. Getting rid of them seems to have caused the Web to
| turn into them, and maybe if they'd remained some kind of
| separation could have been maintained.
|
| Java Applets were killed by Flash.
| chalst wrote:
| > PDFs are unexpectedly well suited for that.
|
| Not so surprising, really: the PDF standard evolved in
| parallel with Adobe's Flash between 2005 and 2010, which
| was then the key technology in Adobe's effort to keep a
| strategic toehold on the web. If Flash had not been a
| security clusterfuck, it might still be around. The PDF
| standard was always meant to be a complementary standard,
| and Adobe's attempted successor technologies have
| followed an even closer technological path.
|
| The PDF standard has benefited from the fact that, unlike
| the W3C and WHATWG, surveillance capitalists have not
| been in the driving seat of its standardisation effort.
| Adobe's interests are not identical to those of the
| public, but they are not as essentially adversarial to
| them as the web standards bodies have been.
| adolph wrote:
| Is the medium the message? Does style have substance? Is
| form also a function?
| leetcrew wrote:
| I'm not exactly sure what point you're trying to make
| here, but I don't think two different formats for
| encoding formatted text with images constitute different
| "mediums".
| megameter wrote:
| Of course they are, and we run into it constantly in
| computing. You can encode text with images as a bitmap,
| as vector graphics, as symbolic content that references
| bitmaps or vectors, as an algorithm that procedurally
| generates any of the above...
|
| While you can produce identical outputs from the
| different methods, it's not hair-splitting to say that
| the authoring process and hence the nature of the medium
| to shape expression is affected by choosing one. When you
| opt towards maximizing generality your production cycle
| can grow without bound because everything is possible by
| layering different media, even if all of it is
| unnecessary. That's how you end up with creative projects
| that take multiple years to decades to accomplish.
| runawaybottle wrote:
| Well, you seem to get the gist of the hot take the author
| put out. This article is not about PDFs. There is something
| wrong with the world and we can sense it.
|
| This is close to it: _When you have teens driving content
| it 's going to feel plastic._
|
| Youth is the ultimate quality destroyer. They just fucking
| suck. I'm quite sick of their drivel honestly, and yet, we
| let them dictate the world (watch my childish cartoons,
| even in old age).
|
| And the little shits complicate code bases. All you little
| rascals under 30, scram, I'm on to you.
|
| And all you little adults acting like children, with your
| stupid motivational posts on LinkedIn, and your garbage
| bragging on there, I see you too.
|
| Stop.
| novok wrote:
| Sounds a lot like epub.
| rexreed wrote:
| Also - how are PDFs exactly "discoverable"? I have petabytes of
| PDFs and making them easily "discoverable" for any mass use,
| such as analytics, search, or data analysis is a massive pain.
| I'd rather have them in a non-PDF format.
| relaxing wrote:
| The author calling for new content to be authored as PDF,
| which can easily be made discoverable.
|
| I'm guessing your data set is made of scans with poor or no
| OCR.
| rexreed wrote:
| Not a single researcher or data analyst I know of would
| prefer "discoverable" content to be in PDF format,
| regardless of just how awesome the OCR is (which it often
| isn't, especially for tabular data). Even for all-text,
| non-tabular documents, OCR does not provide the metadata
| needed to make sense of the documents. Why PDF is claimed
| to have superior "discoverability" in the OP essay is a
| mystery to me. For the sake of "discoverability", PDF is
| definitely not the way to go.
| relaxing wrote:
| The essay claimed
|
| > PDFs are discoverable. Search engines index them as
| easily as any other format.
|
| What you're taking about has nothing to do with that.
| gunapologist99 wrote:
| agreed.
|
| and, ancient HTML can still be easily read by modern browsers,
| so that's not exactly a special attribute of PDF either.
| camgunz wrote:
| You got nerd sniped by the HTML vs. PDF format thing and missed
| the entire point of TA:
|
| > Isn't it a good thing that we enjoy rapid progress? To the
| extent that we get to enjoy things like YouTube and sandspiel,
| yes! But to the extent that we want the internet to be a place
| where we can work and live and think and communicate free of
| malware, surveillance, dark patterns and the insidious
| influence of advertising, the answer is, empirically, sadly,
| no. The web has become ad-corrupted hand-in-hand with growth in
| technological capability, and the symbiotic relationship
| between web and browser means they feed on each others' churn.
| Ads demand new sources of novelty to put themselves on, so the
| web expands continually, the specs grow in complexity, the
| browsers grow in sophistication, the barrier to entry grows
| ever higher, the vast cost of it all demands more ad revenue to
| fund it... and thus the perpetual motion machine is complete.
| 6510 wrote:
| The classic mistaking the example for the topic.
| prophesi wrote:
| No, the entire point of the article is to convince people to
| use PDF/A. Which I find comical since you have to go out of
| your way to check if a PDF is PDF/A compliant. If the web was
| run by PDF's, there's no reason why any big corporations
| would abide by those rules, and it'd be just as messy as HTML
| is today.
| camgunz wrote:
| You've also been nerd sniped. TA goes on and on about
| surveillance capitalism and the attention economy. Weird,
| for an article that's supposedly convincing engineers of
| the merits of one file format over another.
| monkeynotes wrote:
| I tackled the premise. I think addressing the premise is
| the logical place to dismantle an argument.
| camgunz wrote:
| But, again, the premise is not that "as a file format,
| PDF is better than HTML". The premise is: because HTML is
| two-way, it enables surveillance capitalism and allows
| bad actors to monopolize the attention economy. The
| author wrote it thus:
|
| > Sure, you can write good HTML. I won't argue with that.
| And if you're writing good HTML, good for you. But HTML
| is a dual-use technology, the bad guys are dual-using it
| an awful lot, and I feel that the stone age still has a
| part to play in the progression of the information age.
|
| The part where you engage with this is where you write:
|
| > I'm sorry, the more I think about this the dumber I
| feel. The web is useful because it's 2-way. I am excited
| by the web because I can interact with other people. I
| come to hacker news to engage with thinkers, not to just
| read a published article from one single author. I want
| to read ad-hoc opinions and user submitted content. PDF
| web, really?
|
| Which is interesting! Do you have thoughts on creating
| peer-to-peer systems that don't enable surveillance
| capitalism?
| apotheon wrote:
| > > Sure, you can write good HTML.
|
| A key here is that it's easier to write good HTML docs
| than good PDF docs, and much harder to deal with the
| harmful aspects of PDF docs given present technology.
|
| > Which is interesting! Do you have thoughts on creating
| peer-to-peer systems that don't enable surveillance
| capitalism?
|
| I don't know about the other person's ideas, but
| decentralization plus better anonymization and
| pseudonimization, with always-on strongest-reasonably-
| posible encryption, seems like the direction to go.
| camgunz wrote:
| > A key here is that it's easier to write good HTML docs
| than good PDF docs, and much harder to deal with the
| harmful aspects of PDF docs given present technology.
|
| Oh, yeah I'm not on the PDF train. That's wild. I'm more
| of a Markdown or Gemtext advocate, or even LaTeX.
|
| > I don't know about the other person's ideas, but
| decentralization plus better anonymization and
| pseudonimization, with always-on strongest-reasonably-
| posible encryption, seems like the direction to go.
|
| Yeah, projects like IPFS (which you reference above) are
| working towards this, but JavaScript still works over
| IPFS. Plus, fingerprinting techniques are pretty bonkers.
| Most of it comes down to JS and various state you keep on
| your local machine (cookies, flash cookies, etc.), but I
| think you need that. How do you maintain a session with a
| peer without some kind of token/cookie?
| prophesi wrote:
| Did you read beyond the "How did it come to this?"
| section? TA goes on and on about web standards and the
| need for PDF/A.
|
| Edit: If the article _was_ all about surveillance
| capitalism, then it wouldn't be worth upvoting as
| actionable solutions are much more valuable than
| preaching to the choir.
| camgunz wrote:
| If you don't think it's clear that the author's advocacy
| of PDF is a means to an end, subservient to their desire
| to dismantle surveillance capitalism and the duopoly that
| Google/Apple have on the web, I don't know where to go
| from here.
| prophesi wrote:
| I think you're the one who got nerd-sniped here. 1.5 of
| the 13 pages in this PDF are about surveillance
| capitalism. The rest's about web standards.
| Aeolun wrote:
| What in the nine hells is nerd sniping?
| anigbrowl wrote:
| why don't we have both?
| cxr wrote:
| The author does identify a problem, and so you want to focus
| on that. That's fine. There is the issue of triviality,
| however.
|
| The problem described is widely felt, and also widely
| discussed. We already _know_ this stuff to be a problem. For
| the piece to be worthwhile, then, it should do something that
| is not present in the other instances where the topic has
| been raised. It should articulate (or at the very least
| exhibit, without necessarily articulating) a solution for us.
| It doesn 't. A bad remedy to a genuine problem does not yield
| a solved problem.
| slashdot2008 wrote:
| The author brings a solution, it is to publish documents in
| PDF instead of HTML.
| apotheon wrote:
| "A bad remedy to a genuine problem does not yield a
| solved problem."
| camgunz wrote:
| The article is called "Deurbanising the Web", and its
| thesis is:
|
| - Publish in static file formats.
|
| - Date and hash your work.
|
| - Stop spying on your users.
|
| HN is a discussion forum, not project planning software.
| Not everything has to "yield a solved problem". Are you
| really setting the bar at "design a technology stack for
| replacing HTML/CSS/JS"? That's way, way too high.
| apotheon wrote:
| Those points can be trivially met with static HTML and
| something like IPFS, and you can _still_ download HTML
| for local storage and viewing. You can even print to PDF
| if you really want to do so. Meanwhile, PDFs also allow
| dynamic files, don 't require dating and hashing, and can
| be used to spy on users or deliver malware.
|
| EDIT: Oh, yeah, and static file formats doesn't
| necessarily have to mean static document formatting when
| viewing -- unless you're using PDFs, which tends to break
| useful stuff like reflowing for paginated documents (one
| of the worst things about even simple PDFs).
| bccdee wrote:
| You say that its thesis is (in part) to generally publish
| in static file formats, but that's not quite accurate.
| The piece specifically touts PDF/A as the best format and
| makes several arguments against the use of html/css. I
| agree that they're making a broader point than just "use
| pdf," but "use pdf" is definitely a large part of it.
| grishka wrote:
| PDFs aren't really meant to be read off a screen, they're much
| better suited for stuff that's meant to be printed out.
|
| And you can have a single self-contained file with a webpage,
| it's called a "web archive", with .mhtml extension.
| 1vuio0pswjnm7 wrote:
| "I come to hacker news to engage with thinkers, not just read a
| published article from a single author."
|
| And how many websites today are anything like HN, in terms of
| relative simplicity, e.g., no images^1, 3rd party requests or
| ads, only a tiny bit of (gratuitous)^2 JS.
|
| 1. I do not particpate in the voting scheme but I could vote
| from the command line if I wanted to. I use a text-only browser
| so the grey, fading text gimmick is irrelevant. I see all
| comments and treat them according to the thinking not the
| voting.
|
| 2. If we exclude the .ico and a .gif
|
| There seems to be a double-standard, for lack of a better term,
| where many HN commenters and voters appear to work for
| companies that make websites with tracking and ads and various
| gimmicks targeted at "non-thinkers" which are nothing at all
| like HN. Whatever these commenters and voters see and
| appreciate in HN they are not working to bring it to the rest
| of the web. I seriously doubt they comment and vote on HN out
| of fear of so-called "power users" or a belief that the HN type
| of simplicity could become more popular and threaten their jobs
| that depend on surveillance, online ads and a non-thinking
| audience of "powerless" users. Rather, a more rational
| explanation might be that they see some value in a website that
| shows no ads and generally uses no gimmicks; that's something
| to think about.
|
| "PDF web" may not make sense to many folks who have invested
| heavily in JS and Big Tech web browsers, but Postscript is
| arguably more elegant than Javascript. "Thinkers" usually like
| FORTH.
|
| https://en.m.wikipedia.org/wiki/Display_PostScript
|
| The tracking section mentions the Abe Vigoda status page.
|
| http://www.abevigoda.com/
| noduerme wrote:
| Honestly, if you're going to put out a manifesto as a PDF, at
| least take some time "layouting" your design. The one advantage
| of that format is that you control the aspect ratio. Every font
| is permissible, everything is absolutely positioned. Using a
| generator to create it is cringey. Show the art that's
| possible. Really sell the format.
|
| FWIW I deliver PDFs daily as an art director; not ideal, but
| they work in most cases. There's certainly nothing rebellious
| or non-commercial about them.
| EugeneOZ wrote:
| ...and difficult to read on the small screens of mobile
| devices.
| noduerme wrote:
| Yeah. That's why they're only used for print.
| goodpoint wrote:
| You seem to miss the point of the post:
|
| ----
|
| Call to action
|
| Publish in static file formats
|
| Date and hash your work
|
| Stop spying on your users
|
| ----
|
| All this cannot be GUARANTEED by HTML/pdf/epub and requires
| active cooperation from the author. This is bad.
| marcosdumay wrote:
| > PDFs don't have any dynamic interaction
|
| Oh, you are set for a world of surprises. Nearly every single
| one bad, but running our current web over PDFs is well within
| the specs.
| ChrisMarshallNY wrote:
| _> Simply build your website with pagination._
|
| My experience is that browsers are _terrible_ with CSS
| pagination support in their display and printing directly.
|
| The only place it seems to actually work is...saving as a
| PDF...
| hyperpape wrote:
| Saying HTML can be offlineable is like saying C can be provably
| terminating. There's a subset of programs where that's true,
| but it's not inherent to the form. A PDF is inherently self-
| contained, standard web technologies are not. When you open the
| page and it's a PDF, it gives you certain guarantees, when you
| open it and it's HTML, you have to have to do further
| investigation.
| JadeNB wrote:
| > When you open the page and it's a PDF, it gives you certain
| guarantees ....
|
| I think that this is a lot less true than we're used to
| thinking. The PDF spec contains a lot more interactive
| capabilities than I think most people realise. (It supports
| JavaScript!) We're not used to seeing those capabilities
| abused, because there's no point; it is so much easier to
| abuse HTML. But, if people _want_ to abuse PDF--and, if we
| somehow convinced the world to move to it, then they would--
| then they easily can.
|
| (I'm not conversant enough in the spec to know, but I do know
| that Postscript is Turing complete, and I don't know that PDF
| isn't. At least HTML on its own certainly isn't--no
| recursion!--although all bets go out the window once you
| start layering other tech on top of it.)
| monkeynotes wrote:
| I don't buy that the problem with the web is that HTML is not
| inherently offlineable. HTML may not be inherently
| offlineable but it can be. PDF isn't inherently a web
| friendly format, but it can be. There really isn't any good
| argument for PDFing the web.
| pajko wrote:
| Print the page to PDF.
| tablespoon wrote:
| > Print the page to PDF.
|
| Even that usually sucks nowadays, because web developers
| don't care anymore. Probably 75% of the time before I do
| that, I have to go into the dev console to delete overlay
| elements that obscure content and garbage that will waste
| 10 pages (e.g. grossly oversized images, related article
| recommendations, etc.).
|
| There was a time when most websites had a print view that
| gave you a simplified html page that worked well, but I
| think most of those are gone now. Now it's all some print
| "media-type" CSS that no one ever put the time in to do
| properly or keep up to date.
| stjohnswarts wrote:
| I agree, I don't see why anyone can call publishing in PDF is
| "dumb". The author of the material gets to choose his medium.
| If "you" don't like it then move along or convert it to your
| preferred format. In other words "why not both?"
| lucideer wrote:
| Firstly, C being provably terminating is a problem dealing
| with the full body of C programs written in the world. The OP
| is dealing with their own self-published content. That's a
| different problem: if your analogy held it would need to be
| limited to proving that a subset of C programs written by the
| author terminate.
|
| Secondly, the level of difficulty in making HTML offlineable
| is many orders of magnitude simpler than your C analogy:
| there's really no comparison. For the OP we only need to make
| HTML documents that _they have authored themselves_
| offlineable and yet people have written general purpose tools
| to do this automatically for most webpages. This is not a
| hard problem.
|
| TL;DR your analogy is absurd.
| hyperpape wrote:
| This is a helpful post because it gets to the heart of the
| difference. Many people are saying "if you do HTML in a
| particular way, you get the same benefits." I'm asking
| "what's inherent to the form?" That's exactly the point
| about C--you can write it in a way that's provably
| terminated, but it's not guaranteed. Consider the
| consumer's perspective.
|
| When I land on a page that's a PDF, I know certain things--
| I can easily save it and read it later. How do I know that?
| Not because I have read the PDF spec, or know that much
| about it, but because of my experience as a consumer of the
| web.
|
| When I land on an arbitrary web-page, do I know the same
| thing? No. I don't know what the page is doing, I don't
| know what my browser will do when I try to save the page.
| When I save this page, I have the option to save HTML only,
| or a complete web page. Will the complete page actually
| work? I go into the source, and there's a link to the
| javascript (which is saved locally). Does rendering the
| page rely on that javascript? Does that javascript do xhr
| or fetch calls? Since it's Hacker News, I suspect the
| answer is no. However that's not inherent to the medium.
|
| There are better ways to archive the content of even
| dynamic JS heavy pages, but they are not things that you
| learn as an average user of the web.
| lucideer wrote:
| I don't really follow. How does this author converting
| their entire site to PDF help readers/visitors/users?
|
| The original HTML site[0] was printable as PDF, and save-
| able as both HTML and "Web page, complete", all of which
| result in a well-formatted & readable offline experience.
| (It was also responsive: very readable on mobile, but
| that's an aside).
|
| The new PDF site is not accessible to some, difficult to
| read on mobile, and interacts poorly with all of the
| norms web users are accustomed to (back navigation,
| anchors, etc.)
|
| [0] https://web.archive.org/web/20130127175816/http://www
| .lab6.c...
| hyperpape wrote:
| It's the difference between "this thing has X property"
| (termination or able to save for offline reading) and
| "this thing _obviously_ has X property, in a way that you
| can tell without any expertise, or doing any
| investigation".
|
| How important this is to users, or whether it is worth it
| is something I've not commented on, but it is a
| _difference_.
| apotheon wrote:
| It's possible to write PDFs that don't "work" (for some
| useful definition of "work" similar to the case with
| HTML) offline. Please stop pretending that's not true.
|
| The reason offline utility tends to be true more often
| for PDFs is that PDFs are not generally regarded as the
| preferred online-default format of choice, which is in
| turn a matter of social effects rather than technical
| capacity. Reverse the socially accepted roles of the two
| document formats and watch the same complaints get made
| against PDFs as you're making against HTML. I'd bet money
| the "normal" state of affairs would remain the same in
| terms of the perceived benefit/detriment allocation
| between online/offline formats; only which format was
| considered which would have changed.
|
| . . . but then all the web would be even heavier
| documents, and even less customizable for local viewing,
| thanks in part to that pagination and strict formatting
| situation.
| anigbrowl wrote:
| It's possible, but it takes work. I can't remember the
| last time a pdf did something unreadably weird, usually
| my only gripe is with something that's a scan of an old
| document but whoever turned it into PDF didn't do OCR.
| chalst wrote:
| hyperpage's analogy would work if the property was "avoids
| undefined behaviour", rather than "avoids nontermination".
| When we encounter a webpage, we are being expected to
| execute potentially complex, well-being threatening code
| whose behaviour is about as easy to predict as obfuscated
| C.
| apotheon wrote:
| PDFs are capable of the same issues.
| lucideer wrote:
| True but again only if we're talking about parsing the
| web. This is about HTML files the author is producing
| themselves.
| EugeneOZ wrote:
| > A PDF is inherently self-contained, standard web
| technologies are not
|
| What technologies exactly? You can have absolutely everything
| you need inside the HTML. You can inline css, js, svg and
| images. What technologies you can't inline?
| aenigma wrote:
| you are correct that you CAN - but who does. That's no
| longer considered best practice. The arugment these days is
| that it's a lot easier to manage css if it's in a separate
| file, same with js, etc. So none of the serious web
| developers actually do anything inline anymore. The time it
| would take to convert a "best practice" website with
| separate files for html, css, js, etc. is just not worth
| it. The point he's making is still valid - why not have the
| option for something static.
| EugeneOZ wrote:
| But with the same (and even much bigger) success you can
| declare "I'm switching to self-contained HTML! No more
| external resources!" instead of "I'm switching to PDF,
| saying farewell to interactivity and mobile devices".
|
| It's just the declaration of ONE person, switching ONE
| site.
| apotheon wrote:
| > why not have the option for something static
|
| You have the same option with either HTML or PDF:
|
| - PDF files can be dynamic or static, depending on how
| you write them.
|
| - HTML files can be dynamic or static, depending on how
| you write them.
| tablespoon wrote:
| >> * PDFs are self-contained and offlineable
|
| > HTML can easily be offline-able. Base64 your images or use
| SVG, put your CSS in the HTML page, remove all 2-way data
| interaction, basically reduce HTML to the same performance as
| PDF and allow it to be downloaded.
|
| You're missing the point. Even a relatively computer-illiterate
| person can easily save a PDF to my hard drive, and it's
| _significantly_ more difficult with HTML. At a minimum you 're
| probably going to get an HTML file with a sidecar directory (or
| I believe a sometimes browser-specific archive, it's been a
| long time since I tried since it works so poorly), and even
| that may not have the content you want to due to dynamic sites.
| apotheon wrote:
| You can write HTML pages to be self-contained and offline-
| friendly.
|
| You can write PDFs to include resources that are not part of
| a single, self-contained file, and to be quite unfriendly
| with offline use.
| enumjorge wrote:
| I guess I don't really understand the point being made. Does
| it matter that much that saving a page create a single file
| in your hard drive? If you really want a static rendering of
| a site why not just print it to a PDF. Why does that have to
| dictate the file format you use for distribution? With PDFs
| you don't have to worry about conversion but they are also
| comparatively larger over the wire.
|
| > even that may not have the content you want to due to
| dynamic sites
|
| But PDFs also don't give you dynamic content. Nothing is
| stopping people from using HTML to serve static, JS-less
| content. In fact that's what it was originally designed to
| do. All this web app stuff was bolted on afterwards, and it's
| optional.
|
| What do we accomplish by having some people switch over to
| PDFs? The people who don't care about bloat will continue to
| not care about it. It's not like thin content will become
| more discoverable or more common. It doesn't really change
| incentives. The author says using PDFs makes it so you're not
| tempted to add cruft to your sites but that's not really a
| compelling argument.
|
| Getting content creators to produce content without bloat is
| not really a technical problem. It's a cultural and economic
| one. I don't see how a file format addresses that.
| spion wrote:
| The file format restricts the possibilties. You know what
| to expect when you see a PDF - static, JS-less content.
| With HTML on the other hand, it depends on what the author
| decided.
| JadeNB wrote:
| > You know what to expect when you see a PDF - static,
| JS-less content.
|
| You know to _expect_ that, but there 's no guarantee
| that's what you _get_. PDF supports JavaScript too.
| fjtktkgnfnr wrote:
| > _Does it matter that much that the artifact of saving a
| page be a single file in your hard drive?_
|
| Yes, it matters a lot. Word/Excel files are actually a zip
| archive containing many files and sub-directories. Can you
| imagine people working with exploded Word files, sending
| over mail and WhatsApp complete directory trees?
| monkeynotes wrote:
| As I explained, if the author wants to make HTML easily
| offlineable then inline CSS and Base64 images. Or, you know,
| make your website printable. If authors actually thought
| about the print to PDF "problem" it could be solved with
| traditional CSS and HTML. As someone else said, we used to do
| this. It used to be part of my every day web design job to
| make sure the page printed nicely.
|
| The idea that the whole web is going to pander to edge case
| archivers is asinine. This whole conversation is about
| supporting the needs of the very, very few and romanticizing
| about the time when only interesting people used the
| internet. It's kind of elitist and self serving.
| naravara wrote:
| > You're missing the point. Even a relatively computer-
| illiterate person can easily save a PDF to my hard drive, and
| it's significantly more difficult with HTML. At a minimum
| you're probably going to get an HTML file with a sidecar
| directory (or I believe a sometimes browser-specific archive,
| it's been a long time since I tried since it works so
| poorly), and even that may not have the content you want to
| due to dynamic sites.
|
| Ctrl+P -> Save as PDF
|
| You don't need the page to be a PDF to save it as a PDF.
| stzups wrote:
| >> it's significantly more difficult with HTML
|
| Right Click > Save as
|
| Try it with this page!
| romwell wrote:
| Yeah, no. Try it with _any other page_ , and see why nobody
| would be inclined to even _try_ "Save As.." a web page
| anymore.
| biztos wrote:
| I actually did this pretty recently, in an attempt to get
| some magazine articles onto my Kobo e-book reader since
| Pocket couldn't fetch the paywalled ones (I do pay).
|
| I figured I could just save the page, automate a few
| edits to get around dynamic stuff, and then use it as,
| you know, an HTML _document._
|
| Even with a nice friendly mostly-text literary magazine,
| after about five hours I gave up and just copy-pasted the
| rendered text.
| tablespoon wrote:
| > Right Click > Save as
|
| > Try it with this page!
|
| Say hello to your new sidecar directory (or broken
| CSS/images/God knows what else)!
|
| I tried to save an NY Times article, and it 1) needed JS to
| display anything, 2) even with the sidecar stuff was
| broken, 3) it was so plastered with ads and other junk I
| thought it was incomplete (it wasn't, I just had to scroll
| waaay down past something that looked like a footer and
| some voids after that).
|
| If you save a PDF, you get that exact PDF on your hard
| drive, and when you open it (even in 10 years) it will look
| exactly the same as it did on the site.
|
| With PDF WYSIWYS: What you see is what you _save_.
| trey-jones wrote:
| This is of course the point of the article - that the web
| is a giant steaming pile of shit for the most part,
| plagued by JS and external resource requirements, all of
| which contribute to massive total page size.
|
| I'll preface by saying I have some expertise in HTML, but
| none in PDF (the format).
|
| The point of most commenters who suggest that HTML is
| still a better alternative than PDF (I agree), are
| assuming that if this is an important issue to you, that
| you would craft your page in a simpler style compared to
| most of what we see on the web, making Print to PDF or
| Save As... more viable. > PDFs and a PDF
| tool ecosystem exist today. No need for another ghost
| town GitHub repo with a promising README
| and v0.1 in progress.
|
| This is news to me. I'm not sure that I buy it. PDFs have
| always been a pain in the ass to work with in my opinion.
| Maybe there are tools, but in my experience they aren't
| very good.
|
| In general, we know that HTML is going to be much more
| compact (and compressible!) than PDF and that's the
| biggest advantage I see on a web where bandwidth still
| matters. Another downside shows itself by trying to copy
| and pasting the above quote: PDF formatting seems to be
| weird.
| tablespoon wrote:
| > This is news to me. I'm not sure that I buy it. PDFs
| have always been a pain in the ass to work with in my
| opinion. Maybe there are tools, but in my experience they
| aren't very good.
|
| > In general, we know that HTML is going to be much more
| compact (and compressible!) than PDF and that's the
| biggest advantage I see on a web where bandwidth still
| matters. Another downside shows itself by trying to copy
| and pasting the above quote: PDF formatting seems to be
| weird.
|
| PDF is a display format. I once worked on a project
| parallel to a guy who was parsing PDF to extract text
| content. IIRC, Text in PDFs is stored in a way that works
| fine for printing/rendering but not so well for
| manipulation (e.g. it's a bunch of commands to render
| line Z at position X,Y with font W). Those commands don't
| have to be in reading order, nor do they have the
| semantic meaning you can get from markup like HTML (e.g.
| superscript can just be nothing more than a different
| line rendered with a smaller font).
|
| IMHO, PDF is actually less optimal than HTML for what
| this guy is advocating, except that it's those precisely
| those limitations that have prevented PDF from becoming
| the mess than Web HTML has. Though, that's probably in
| large part because the bloaters have been too distracted
| by the easier-target that is HTML to bother.
| chalst wrote:
| > In we know that HTML is going to be much more compact
| (and compressible!) than PDF and that's the biggest
| advantage I see on a web where bandwidth still matters.
|
| PDFs can be tiny if they do not embed fonts. Serving
| fonts is very much a complex technology in HTML world.
|
| Browsing the web is a pain in the ass if you don't use a
| browser compliant with up-to-date standards, but the
| whole "HTML can be lightweight" argument pretty much
| depends on avoiding much of today's standardisation. As
| an objection to the original argument, it is not
| comparing like with like.
| JadeNB wrote:
| > >> it's significantly more difficult with HTML
|
| > Right Click > Save as
|
| > Try it with this page!
|
| HN is not a good site to illustrate the unpleasantnesses of
| navigating the modern web. As you'd hope for a _hacker_
| news site, it is very friendly to this sort of thing. Most
| sites aren 't.
| justusthane wrote:
| But if you want a page in PDF, you can print it to PDF. Sure,
| non-computer-savvy users might not know how to do it off-the-
| bat, but browsers make it pretty easy.
| tablespoon wrote:
| > But if you want a page in PDF, you can print it to PDF...
|
| Printing a page to PDF usually _sucks_ : See
| https://news.ycombinator.com/item?id=27883028
| MisterBastahrd wrote:
| Or I could just make sure that my page prints reasonably well
| (we used to do this) and use the print-to-pdf functionality
| available in modern browsers.
| Koshkin wrote:
| All true. Incidentally, I do not see pagination as necessary or
| in most cases even desirable; rather, I see it as a vestige of
| the printing technology, while the need for printing has shrunk
| dramatically over the past 20 years.
| eaton wrote:
| The whole post boils down to: "HTML is bad because it has scope
| creep and people use it for bad things, but PDF is good because I
| made this particular document in a way I like for a use case I
| prefer."
|
| You do you, man! Some people run Archie servers, some people
| create a directory full of PDFs.
| ergot_vacation wrote:
| The sad thing is, this is what the web was SUPPOSED to be, more
| or less: a series of static documents, text and images. The only
| interactivity (setting aside the occasional CGI forms) was that
| you could click certain images or text and go to other static
| documents. Documents linked to documents.
|
| Then everyone lost their minds and decided webpages needed to be
| PROGRAMS and we've been paying the price ever since.
| saint-loup wrote:
| This experiment is interesting, but not so bold or novel when you
| consider the culture around making zines (small, DIY, often
| quirky magazines). The creativity there is amazing and medium-
| wise it's often "hybrid" (print-oriented but shared online).
|
| For instance there's this tool to help creating zines.
| https://alienmelon.itch.io/electric-zine-maker
| BaldricksGhost wrote:
| How about plain old HTML? Might not be as pretty but it sure
| beats a bloated PDF.
| npteljes wrote:
| It also wouldn't be upvoted on HN. I agree that a static page
| generator would have been a much more fitting technology (for
| example). But sometimes you gotta sacrifice that for
| visibility.
| SynapsePixels wrote:
| * PDFs used to be inaccessible, but now you can tag them.
|
| This PDF is not accessibility compliant.
| gtirloni wrote:
| I can't read this in my phone. There's no automatic layout and
| the fonts are too small. Zooming in works but it's a nightmare to
| navigate.
|
| This is an accessibility disaster.
| halayli wrote:
| pdf a major attack surface too.
| Ostrogodsky wrote:
| "And for that reason I am creating a 1 MB behemoth that you need
| to download to read 3000 words or so."
| croes wrote:
| Now fight
|
| https://www.nngroup.com/articles/pdf-unfit-for-human-consump...
| qwerty456127 wrote:
| PDF is very far from an ideal format for the today world of
| different-sized screens. It is a horrible experience on mobile
| and even worse on eInk pocket books. I would rather advocate
| making everything available in ePub. Or even better - FB2, it is
| an easy to grok/implement (designed with manual authoring, simple
| scripted processing and low-end devices in mind) single-xml
| structure decoupling the content from the view even more. I often
| convert ePubs to FB2 (with Pandoc and Calibre) to make PocketBook
| render them in its native fonts (which always are better) rather
| than in the font specified in the ePub.
|
| I would also mention that the text within PDFs often is not
| machine-readable (you copy-paste it and get text without spaces,
| with additional spaces or complete garbage) but I believe this is
| easily avoidable if you bake PDFs a proper way.
|
| I could also suggest publishing everything in Markdown (with
| images embedded in a Base64 section in the bottom) but this
| doesn't seem practical because browsers, book-reading apps and
| eInk devices don't support nice rendering of them directly.
|
| > "But how can I implement shiny whizz-bang features that will
| engage readers and drive conversions?!" You can't. PDF is boring
|
| It's not. It supports JavaScript, embedded video and other kinds
| of active content. Sadly.
| SethMurphy wrote:
| Naming or framing things in a difficult or obtuse way can be a
| good way to limit your audience. However, if it works others will
| follow and it will no longer be effective.
|
| I had a similar experience with a Meetup I once hosted which I
| specifically put in a location that was difficult (but admittedly
| becoming trendy). It worked for a bit but eventually attracted
| the crowd I was trying to alienate.
| wccrawford wrote:
| I find it quite amusing that the author is railing against HTML
| at least in part because it's practically impossible to build a
| new web browser at this point, and then moves to PDF instead.
|
| In my time working with PDFs, I've found that generating them in
| ways that can be read with the most popular PDF readers is
| cryptic and difficult, and even parsing the ones made from the
| most popular creators is hard.
|
| I would definitely not pick PDF over HTML in regards to how easy
| it is to implement a good reader or writer.
|
| And there's plenty of authoring tools for HTML already, so the
| "ecosystem already exists for PDF" doesn't track either.
|
| Even the complaint about churn makes no sense to me, because
| there's no need to upgrade your tools constantly. If you're using
| something that produces good HTML today, it'll produce good HTML
| in a decade, too.
|
| OTOH, if you have a problem that could be automated, you're a lot
| more likely to be able to create that tool for HTML than PDF, and
| it's quite likely that someone else already has for HTML, but not
| PDF.
| TheFreim wrote:
| > In my time working with PDFs, I've found that generating them
| in ways that can be read with the most popular PDF readers is
| cryptic and difficult, and even parsing the ones made from the
| most popular creators is hard.
|
| Both pdf readers on my phone can't read the pdf, so this is
| definitely an issue.
| trhoad wrote:
| I just ran your PDF through an accessibility checker and it
| failed magnificently. For this reason alone, suggesting people
| make more use of PDFs instead of well-formatted HTML is a total
| non-starter for me (and should be for everyone).
| wy35 wrote:
| My thoughts exactly, I feel like it would be easier to write
| accessible webpages (given the wealth of accessibility tools).
| john-doe wrote:
| Even Word documents are more accessible than PDFs.
| zinekeller wrote:
| Heck, even PDFs produced by Word (or comparable FOSS editors)
| are so much better (except if you've done it incorrectly by
| "printing" it) than this particular one.
| Finnucane wrote:
| Making properly accessible PDFs is possible, but it is a pain
| in the ass. Certainly more difficult than with plain HTML.
| robin_reala wrote:
| It's entirely possible to write accessible PDFs. It's just that
| no-one does.
| trhoad wrote:
| It is indeed! And you're right, nobody does, including this
| example.
| jfk13 wrote:
| And even if they did, many of the readers/viewers people
| use wouldn't fully support it.
|
| While it's possible to royally mess up accessibility in
| HTML, too, the chances of getting something usable are at
| least somewhat better.
| cerved wrote:
| this is a joke right?
| jacamera wrote:
| Yes. Though I think the real question is whether or not it was
| intentional.
| Symbiote wrote:
| > PDFs used to be unreadable on small screens, but now you can
| reflowthem.
|
| (Pasted verbatim, retaining the missing space.)
|
| I don't see this feature in Firefox's viewer, or the default
| Android one. Can anyone recommend a FOSS PDF viewer that has it?
| (It must be FOSS, otherwise the point about using PDF to avoid
| tracking is lost.)
| nulbyte wrote:
| Book Reader can reflow PDFs. It is very simple,, which I like.
| But it adds any PDF you open to the library when you open the
| app, which I find only slightly annoying for non-books.
| divbzero wrote:
| I like the spirit of this but would prefer text or static HTML
| over PDF as choice of file format.
| [deleted]
| agomez314 wrote:
| Th author has a point in that many people want an online presence
| but the way the imagine it is more akin to a pamphlet or poster
| than a hyperlinked website.
|
| If that is the case, then pdf or a resizable image makes sense.
| richardwhiuk wrote:
| You can server side render PDFs and make them dynamic if you
| wanted to.
| GrumpyNl wrote:
| Instead of pdf, why not the most basic HTML?
| yesenadam wrote:
| This was a great read. I'm sympathetic! I've had a website
| (Wordpress) for almost 10 years, but have stopped adding stuff to
| it lately, because I'm sick of the formatting changing on pages!
| I look again at a page that used to look great, now the vertical
| spacing is wrong, or tables have gone out of shape, or the font
| has changed to something awful. Maybe it's wordpress, maybe it's
| my bad css/html skills, maybe something else, not sure. I picked
| up LaTeX skills about 5 years ago and have just been making
| lovely PDF books of everything I'm into. And they stay just the
| way I made them. Kind of a shame though, no-one else gets to see
| them. Yet.
| justanotherguy0 wrote:
| Not optimized for mobile so I didn't read much and bounced.
| PretzelPirate wrote:
| I read it on my phone. I then clicked an external link at the
| end and then hit my browser back button. I had to wait for the
| PDF to re-load and was unhappy when I found myself back at the
| top of the document.
|
| I would get a much better experience with html.
| DocTomoe wrote:
| This sounds like the Creative Director I worked with, ca. 1998,
| who bemoaned that he couldn't have pixel-perfect layouts over a
| wide variety of devices/browsers/operating systems.
| apotheon wrote:
| Why does it seem like almost everyone doesn't realize that PDFs
| can easily be made to support all the horrors we see in HTML? No,
| it's fucking well not impossible -- or even notably difficult --
| to jam some malicious dynamic code into a PDF. The only reason a
| period of widespread fear about PDF viruses hasn't developed as
| it has for websites spreading malicious code is the fact that
| websites got much more widely adopted. PDFs have been used as
| malicious code vectors before, and replacing HTML with PDFs would
| only result in PDFs being the new common vector for the same
| problem, with at least the same scale and intensity.
|
| This only seems like a solution if you don't know what PDFs can
| do -- and, by the way, sometimes pagination is bad, especially
| static (non-reflow) pagination.
|
| EDIT:
|
| Let's make this clearer.
|
| You can actually embed an entire JavaScript application in a PDF.
| Tell me again how PDFs somehow prevent the problem of dynamic
| pages on the web. All using PDFs instead of HTML pages would do
| is wrap the horrors of the web in forms that are generally more
| hostile to various viewing contexts for the less harmful use
| cases (e.g. static pages suddenly being harder to read in some
| contexts with PDFs than with HTML pages).
| millerm wrote:
| Yeah, 10 second load time, tiny text on a mobile device. No
| thanks. Sucks that people went for over-styling every site making
| everything painful to publish. I'd be happy with 90's static
| HTML, and a few images when needed. I seek information, not "an
| experience".
| Robotbeat wrote:
| On the contrary, I much prefer a small text on a mobile device
| to the reflowed text on a mobile device that we're always
| forced to use. The PDF is also the same view as on a desktop,
| so if I look at it on another device, my spatial memory of
| where stuff is remains intact.
| millerm wrote:
| Might as well just generate a PNG. The text is too small for
| me on a mobile device. PDFs main goal was print. The fonts
| are awful for the screen and no ability to reflow the text.
|
| I can deal with things moving around, I don't need spatial
| memory for that. Just give good titles, headers, and indexes.
| Again, we can do this with simple HTML, embed images and
| styles. It's all there.
|
| Unfortunately, as I mentioned, people don't really publish
| information anymore. It's mainly for "experience" and for
| "looks". Marketing, and advertising, now drive the
| information era. The "Information Super Highway" is now just
| a crumbling road plastered with billboards. Most content is
| useless, and is there for clicks. Heck, I'd rather someone
| post their site in digests in e-book formats than PDF.
| fbrchps wrote:
| Exactly my reaction to opening the site.
|
| I had no idea what the content of the site was (besides the
| title from HN) and around the 50% download point, I had already
| lost interest. I'm clearly not the only one who loses interest
| this quick [0][1][2].
|
| Also, as others have mentioned in root level comments, the
| design & layout of the content within is also severely lacking,
| which makes waiting for the load to occur even less worth it.
|
| ---
|
| [0]: https://www.pingdom.com/blog/page-load-time-really-affect-
| bo... (2018)
|
| [1]: https://blog.mozilla.org/metrics/2010/03/31/firefox-page-
| loa... (2010)
|
| [2]: https://www.thinkwithgoogle.com/marketing-strategies/app-
| and... (I know it's Google, but to be fair they have more data
| on this than most other companies, despite their obvious desire
| to sell more of their product/services related to it.)
| AlexAffe wrote:
| Exactly this. It is by the way one of the main reasons I
| initially stuck with HN. The lean UI, text based simplicity,
| efficiently conveying information had me instantly. I would
| sacrifize styling for speed anytime, everywhere.
| uncomputation wrote:
| I cannot tell if this is satirical or not. Assuming it is not,
| every single "pro" of PDFs is just plain incorrect except for the
| one about being "self-contained" to which I point to
| https://gwern.net as a good example of self-contained HTML. Gwern
| archives all the pages he references so that they are always
| available.
|
| In the case this is satire, I applaud it because I did get a few
| chuckles.
| bittercynic wrote:
| In the words of the great Ivan Stang: "I'm joking AND I'm
| serious!"
|
| *I'm not the author, just thought the sentiment from that quote
| applied here.
| the_other wrote:
| If you don't want churn, don't churn.
|
| PDF is not a web format and you're wasting effort trying to
| shoehorn print content and a print format for display on the web.
| Just use HTML and don't update it, it's probably easier.
| nonameiguess wrote:
| It's not a browser format (though browsers can render it), but
| that isn't the same not being a web format. The web is just the
| ability to retrieve files from other people's servers, that may
| themselves reference other files on yet other people's servers.
| As long as a file format supports hyperlinks, then it's
| suitable for the web. If you don't care about being able to
| actually click the hyperlink to activate your desktop system's
| uri schema handler, then even plain text works fine.
| silon42 wrote:
| EPUB?
| jacobmischka wrote:
| Which is just basic HTML and CSS itself.
| guywhocodes wrote:
| Yeah but it's a decent subset. Most of the complaints of
| the author should be significantly better
| jacobmischka wrote:
| It would be better if they just used that subset and just
| published it directly instead of needlessly repackaging
| it, but if that's what was meant then sure. Maybe we need
| a better name for simple, semantic HTML and basic CSS.
| Finnucane wrote:
| The point of it is to be a self-contained package. You
| still need hardware to read it, but not a server. In
| theory at least, once you have it, it's yours. (of course
| the commerical ebook vendors are trying to spoil that.)
| goodpoint wrote:
| No, it still supports plenty of trackers/spyware and so
| on.
| mojuba wrote:
| EPUB is an under-appreciated format that I think can serve as
| a short to mid-term storage for human knowledge. Can
| reasonably re-flow itself when necessary, no language run-
| time required, just a full Unicode support at least at the
| level of the time the file was published.
|
| That's the Internet of knowledge I'd love to see: things
| organized in EPUB's, searchable and downloadable.
| massysett wrote:
| It's pretty amazing that the basic HTML that I learned 20 years
| ago still works - it even displays fine on devices like tablets
| and phones that did not even exist 20 years ago. I understand
| the author's sentiment but PDF is an overreaction. Just write
| static boring HTML.
| cxr wrote:
| Indeed, there's a lot of irony packed into the first page:
|
| Featured is a quote from LWN indicting the "software
| industry" and its "brittle dependencies". What's ironic about
| this? It's squarely about the parts of the software industry
| that deal in things that are _not_ meant to be painted in the
| browser.
|
| If you want a solution to the (perceived) churn, it's funnily
| enough right in the quote from Mark Pilgrim: "I've migrated
| to HTML 4". HTML is almost certainly not going to end up
| drifting in such a way that DJB's qhasm bibliography page[1]
| is ever going to break. HTML and the Web standards in general
| are, with extremely rare exceptions, _cumulative_. It 's
| pretty frightening how many technical people don't understand
| this; the Web is intentionally engineered to serve as "the
| infrastructure for handling humanity's publishing needs
| indefinitely"[2]. More frightening is that the biggest threat
| to this are people like the author here who treat the Web as
| if it's like any other thing that the computing industry puts
| out--i.e., already perennially broken. This is dangerous
| because it anachronistically cedes power to folks who'd try
| to argue at some point in the future that the things about
| the Web that they'd like to break (and might be in a position
| to break e.g. due to browser monopoly) are justified and no
| big deal, really.
|
| The author goes on to call out the Web ("of rubbish") as
| "user-hostile". Shortly afterward, he or she writes that "PDF
| makes a stand against the churn". More accurately, PDF makes
| a stand against the user, by prioritizing authors' creative
| whims over the reader's needs. This happens again later in
| their remarks about PDFs being page-oriented: "you are
| fundamentally not in control of the reading experience." The
| "you" here is not you, the actual reader. The control they
| refer to is, once again, the author's.
|
| You get other poor arguments--that PDFs are "offlineable"
| "files" that can be distributed "decentralized", none of
| which are accurate criticisms against what HTML lacks--unless
| those Java documentation zipballs that seemingly every
| university student enrolled in a CS program in the early
| 2000s was made to download are a collective hallucination.
|
| And it gets worse from there. Cute stunt to grab attention
| and all, but the arguments are fundamentally bankrupt.
|
| 1. http://cr.yp.to/qhasm/literature.html
|
| 2. https://news.ycombinator.com/item?id=27368632
| the_other wrote:
| Thank you for this detailed response!!
| account42 wrote:
| > it even displays fine on devices like tablets and phones
| that did not even exist 20 years ago
|
| It would display perfectly if mobile browsers didn't have
| broken defaults (to work around broken websites) that you
| need to disable using <meta name="viewport"
| content="width=device-width, initial-scale=1">.
| austincheney wrote:
| That's a hard sell. The churn exists because people want it,
| not end users, but people who are paid to produce websites.
|
| Most churn comes in two flavors:
|
| * analytics and spyware
|
| * convenience code for insecure developers
| duxup wrote:
| My kids school used to send links to google docs for their
| announcements, I hated it. I pretty much hate any system like
| that, it's purely extra steps on the web.
|
| In both email, and the browser I'm already in a program that
| displays text and images and cool stuff. So then I'm just sent a
| link to someplace else that does the same thing?
|
| So then what? Is it all just "pdf can do that too", but with
| extra steps...? I can print to PDF in most browsers if I want,
| but in this case it isn't a choice.
|
| The idea that I might save and store the school emails or that
| website and somehow manage those files seems kinda self important
| in a way ... I don't mean that as a personal attack, just that
| this idea that they imagine me taking the time to do that with
| their content? When otherwise it could have just been an
| accessible web page? How many people care to do that?
|
| If I'm visiting a website I'm almost certainly not interested in
| saving your content / managing it... almost never.
|
| I'm a little lost on the whole 'page-oriented' idea too. That's
| just a limitation of paper, and it's a pain / disruptive more
| often than not. Even the 'page oriented' section is broken up by
| the page and some extra text at the bottom of the page that is
| irrelevant to the paragraph...
|
| If folks want a 'save to pdf' option might be nice to add, or the
| user can just print to pdf...
| cunthorpe wrote:
| Please somebody bake an icon into the browser that turns green
| when websites are lightweight and content-only and make it affect
| Google rankings.
|
| We don't need PDF sites, we need incentives for publishing
| acceptable websites.
|
| Side note: I'd honestly love for the government to step in and
| outright outlaw some obvious and intentional dark patterns
| (example: California unsubscribe law)
| titzer wrote:
| > make it affect Google rankings.
|
| Google is never going to make a change to its rankings that
| interferes with its real goal of 23% YoY revenue growth.
| blacktriangle wrote:
| Is that actually an internal Google goal? If so, dear god, no
| wonder they are so willing to sacrifice the long term health
| of the internet in return for short term hypergrowth. No
| company Google's size can grow that fast without some serious
| dark patterns and user abuse.
| titzer wrote:
| You don't end up with that level of growth year over year
| for 20 years straight _by accident_. It is an unwritten
| assumption that missing 20% growth is a fail. I worked at
| Google almost 10 years and watched the dog and pony show
| (aka TGIF) from the inside. The real story is on the
| quarterly financial reports.
| janandonly wrote:
| PDF-fing everything on your website is one way to go about it...
|
| I personally use the service at printfriendly [1] and Arc90's
| Readability to make un-crufted and readable PDF files of web
| content that is worth saving for the coming decades. Added bonus:
| by saving these very small files on my system pressing the
| Command + Spacebar on my system I can easily search through my
| multiple decades of interesting files...
|
| [1] https://www.printfriendly.com [2]
| https://ejucovy.github.io/readability/
| afavour wrote:
| Didn't expect I'd see a top post on HN _defending_ the page-
| centric nature of PDFs. A pager format is awful for anything
| other than printing out pages.
|
| But hey, it's a big wide web, you do you. But I won't be reading.
| 0xcoffee wrote:
| Excellent! Excited to see the next PDF generator framework.
| msoad wrote:
| Company's S-1 documents are shared on Hacker News. SEC publishes
| them in both PDF and HTML. Guess which one works better?
|
| It's not the fault of HTML standard if people are using React
| plus 20 different libraries for a simple static content
| 8note wrote:
| "PDFs are self contained, and can't be broken by an API going
| down"
|
| Is directly broken by "PDFs are part of the web, and part of the
| content can be by reference to a webpage"
|
| If that webpage goes down, that link it broken.
|
| That decentralized bit still needs to conform to broken copyright
| laws too.
|
| You can't just download a pdf then rehost it on your own without
| a license to do so
|
| .... There's also a big difference between a city and the modern
| web. We own the infrastructure in a city, vs rich people own it
| on the web.
|
| Rather than a city, the web is more like a company town. I don't
| think that's any different for pdfs either. The distribution is
| still coming from a web server owned by a company -- the real
| response is self hosting of your stuff, and self hosting by your
| friends for their stuff. The file format doesn't make it self
| hosted
| [deleted]
| greatgib wrote:
| What is the summary?
|
| Same as someone else, to read on mobile I have to download and
| open a pdf so i just cancelled the download and ignored the link
| prox wrote:
| What is the bump you experience that you don't want to download
| and open a pdf? Here it opens in my browser directly (Safari)
| lucideer wrote:
| macos/ios have this built in but not all OSes come with a pdf
| viewer
| TheFreim wrote:
| For me all my readers (I have multiple on my phone) all can't
| open the file for some reason.
| kzrdude wrote:
| It ends up in the downloads folder and needs cleanup later.
| codetrotter wrote:
| In all browsers that I use that is only true if the server
| sends a Content-Disposition header with its value set to
| "attachment" (optionally with a file name), or maybe also
| in the case where the server specifies incorrect or
| unspecific Content-Type (such as simply "application/octet-
| stream" instead of "application/pdf").
| kzrdude wrote:
| What I said happens for Firefox on android,
| unfortunately. It's a great browser, of course.
| alisonkisk wrote:
| Even on mobile?
| codetrotter wrote:
| Yes. I use Safari on iOS.
| sharken wrote:
| On the Brave browser for Android it also downloads the
| PDF file and stores it locally. Websites should use HTML
| and not PDF in my opinion.
|
| On top of that the end result is not very readable on
| mobile, the font is too small.
| codetrotter wrote:
| > [...] Websites should use HTML and not PDF in my
| opinion.
|
| > On top of that the end result is not very readable on
| mobile, the font is too small.
|
| Agreed on both counts. Was only commenting about browsers
| saving PDFs.
|
| PDF is not a comfortable format for reading on a screen.
| Nor a comfortable format to extract text or data from.
| admax88q wrote:
| Well this sucked to read on mobile. I'll stick to HTML.
| kissgyorgy wrote:
| It's a terrible "implementation", but interesting observations we
| should consider.
| prox wrote:
| I love the basic idea here. Needs polishing if you want to blow
| this up to the masses.
|
| It's like my Pi who just does one thing really well, and allows
| me to tinker on every level if I so choose.
| prox wrote:
| I like to add that I think a well designed PDF is just so much
| better looking than any html based page (and has a lot more
| freedom)
| pasc1878 wrote:
| Definitely less freedom. On html I the reader can change the
| size of text or even the font and the text will reflow so you
| don't need scroll horizontally to read each line. How do you
| do that to a pdf?
| prox wrote:
| That's not what I mean (your point has merit)
|
| If I ask a designer to design a website, he has to send it
| of for implementation, or is confined by html
| breakpoint/accessibility options.
|
| PDF can go straight from designer to document and do
| everything in a program like designer, indesign and so on.
|
| It's a designer first paradigm.
| atemerev wrote:
| "But stable standards are incredibly important.They allow
| software, at least in theory, to be finished. Why is it
| importantthat software be finished? Because it gives us hope that
| we might end thechurn and fix all the bugs! I want to use
| software whose version number is7 1.0. I want to use software
| whose every line of code has been studied,analysed, optimised and
| punishingly tested. I want every component andsubcomponent and
| every interaction and every configuration to beexquisitely
| documented, and taught in courses, and painstakinglydeconstructed
| and proven sound"
|
| Sorry, not possible. Never, ever. Software does not work like
| that. Bugs will never be fixed (if they could, the software in
| question would have become obsolete long ago). By the way, this
| is what you get when you try to copypaste text from this
| "website".
| fsiefken wrote:
| Very good, I go for project gemini
| https://gemini.circumlunar.space/docs/faq.gmi
| MichalSternik wrote:
| Well, what's wrong with static site (generators)?
|
| I certainly get the argument, but using something like hugo or
| gatsby or jekyll when you want to avoid the "churn" also seems
| like a perfectly valid solution.
| nonameiguess wrote:
| The author addresses this pretty well. Because you can embed
| whatever you want, static site generators aren't really static.
| In particular, Jekyll blogs and what not still pretty commonly
| include comment sections.
|
| Of course, pdfs aren't necessarily static, either, but that is
| why Lab6 is choosing to use pdf/a, an actually static format
| intended specifically for long-term archiving of immutable
| files. This way you can sign the file and guarantee it stays
| the same forever and everyone's copy is identical.
|
| I'm kind of surprised at the response to this. The author seems
| well aware of how terrible pdf is as a format and this isn't
| some treatise of why we should want to use it. It's an
| unfortunate compromise that, given the requirements they're
| aiming to meet, of generating a file that supports rich
| formatting and hyperlink embedding, but which can guarantee
| immutability and long-term archiving directly in the spec,
| pdf/a is all there is, so in spite of being a terrible format
| with a lot of shortcomings, it's what they're using.
| account42 wrote:
| > The author addresses this pretty well. Because you can
| embed whatever you want, static site generators aren't really
| static. In particular, Jekyll blogs and what not still pretty
| commonly include comment sections.
|
| But just like you can choose to use PDF/A, you can also
| choose to have a completely static and self-contained (e.g.
| using data URLs for images) HTML page.
| IshKebab wrote:
| Why don't they just use a static subset of HTML? You don't
| _have_ to include comments sections, just like you don 't
| _have_ to include 3D CAD models and videos in your PDFs (yes
| you can do both of those, in theory anyway).
| danShumway wrote:
| > pdf/a is all there is
|
| Nobody is requiring you to use PDF/A. No mainline browser
| (that I'm aware of) requires it.
|
| So what is being solved? When I click on a PDF on the web, I
| don't know if it's using PDF/A, I don't know if it's
| embedding or linking its fonts. So it's the same situation,
| nothing has changed.
|
| Telling people to use PDF/A when most clients do not enforce
| it and when there's no indication to users before they click
| on a link whether or not the link is following the spec -- it
| is exactly the same as telling them to use a subset of HTML;
| the author is doing the same thing they complain about.
|
| You can't just say that PDF/A exists. That's not enough, how
| will you get people to restrict themselves to that format
| when 99% of their users will never notice the difference and
| no client is enforcing it?
| float4 wrote:
| The only thing I like about PDF compared to HTML is that with
| PDF, I know for a fact that no web requests are made in the
| background. That means no fingerprinting, no analytics etc.
|
| With HTML, I have to trust that some random entity does what
| they state in their privacy policy, and they regularly don't.
| Sure, I can disable JS, but then 95% of the web doesn't work
| anymore.
|
| Other than that PDF is quite clearly a less accessible format.
| account42 wrote:
| > With HTML, I have to trust that some random entity does
| what they state in their privacy policy, and they regularly
| don't. Sure, I can disable JS, but then 95% of the web
| doesn't work anymore.
|
| If you only allow PDF, then 99.9999% of the web doesn't work
| anymore.
|
| I'm all for getting sites to be static, but PDF doesn't fix
| that because the problem has never been the technology used
| to build the site.
| grncdr wrote:
| Are you sure? I was under the impression that PDFs can
| reference web resources, and this is why there are more
| stringent standards for archiving (PDF/A and friends)
| andrepd wrote:
| You can not use js on your website.
| jefftk wrote:
| How sure are you that there are no network requests
| happening? I tried to look this up and wasn't able to find
| any clear answer.
|
| (It looks like at least some PDF readers have provided
| support for automatically displaying external images, for
| example)
| foobar33333 wrote:
| The full PDF spec is insane and allows for web requests and
| javascript. Most readers do not implement the anti features
| but adobe's tools will.
| robin_reala wrote:
| How do you know for a fact? PDF has JS in the spec, and it
| supports SOAP and Web Services. Have a look at
| https://www.adobe.com/go/acrobatsdk_jsdevguide
| float4 wrote:
| That's not the PDF spec is it? That is a spec for Adobe
| Acrobat, which is not allowed to make any web requests
| thanks to my application firewall (Little Snitch).
|
| Pretty sure a PDF opened in the browser can't run any JS,
| but not completely sure. So you're right: I don't really
| know it for a fact. Poor choice of words.
| robin_reala wrote:
| The spec is ISO 32000, and it's expensive and closed, so
| difficult to reference. But according to Wikipedia at
| least, JavaScript is normative in it. No idea if SOAP /
| Web Services is part of it though.
| jl6 wrote:
| The spec for PDF 1.7 is here: https://www.adobe.com/conte
| nt/dam/acom/en/devnet/pdf/pdfs/PD...
|
| JavaScript is allowed, but not in PDF/A, which is what I
| use.
|
| The PDF 2.0 spec is damnably not public.
| the8472 wrote:
| But you can't easily tell PDF/A and regular PDF apart, so
| we're back to the same situation as HTML vs. HTML with
| javascript turned off.
| deregulateMed wrote:
| You are fingerprinted when you find the web link.
| float4 wrote:
| When I click a link you mean? Definitely true, but that way
| they only have access to my IP and user agent, which is
| still better than all the WebGL, Font library, display
| calibration settings, mouse movement etc. that they use
| otherwise.
|
| I often use Tor, although I'm pretty sure that even then, a
| good analytics lib can see it's me based on scroll
| behaviour, mouse movement, time of day, and of course what
| I browse.
|
| But yeah, you make a good point.
| deregulateMed wrote:
| Where do you get the link?
| float4 wrote:
| DDG mostly, and they don't track users.
| deregulateMed wrote:
| Your device, your device version, screen size, browser,
| browser version, IP address, etc... Are all tracked
| regardless.
|
| You might not be a unique fingerprint, but at best you
| are part of a group of somewhere between 3 and 1000
| similar users.
|
| Not to be a downer, but when I webscraped I learned that
| big corporations can spend money to fingerprint you.
| andrepd wrote:
| Why?
| throw0101a wrote:
| > _I certainly get the argument, but using something like hugo
| or gatsby or jekyll_ [...]
|
| Or a plug-in to Wordpress so you can keep the GUI/dynamic for
| the less technical employees:
|
| * https://wordpress.org/plugins/simply-static/
| 101008 wrote:
| I've been doing something similar for 4 years now. I converted my
| niche website into a monthly magazine, that is released as a PDF
| (and also uploaded to Issuu).
|
| It has its good sides and bad sides. People will download the PDF
| every month when there is a new issue, but you don't know if they
| read it, how much time they spend on it, etc. You won't appear on
| Google Results as you would do if you posted the articles as
| HTML, etc.
|
| Based on my experience, I just keep doing it as an experiment and
| because I enjoy saying I run a digital magazine, but the true is
| that there is no real advantages on it.
| jl6 wrote:
| > you don't know if they read it, how much time they spend on
| it, etc.
|
| This is an excellent feature, for the user.
| 101008 wrote:
| Yes, for the user it has some advantages:
|
| . Download it and keep it forever. . Read offline. . Be able
| to share it through email, etc . Print it and read it in a
| nice place! (I encourage this)
|
| Of course, it has some downsides: . No responsive, so people
| who download it from a phone may hate it. . No accesibility.
| croes wrote:
| Useless rant. His choice won't change the rest of the internet
| and for his site he could easily write lean html without all the
| stuff he complains about.
| LightG wrote:
| Appreciated the sentiment of it.
|
| It's not ideal, but in a non-ideal world where the big boys have
| ruined the web, I tip my hat to this effort with a large dose of
| empathy.
|
| Cheers,
| eloisius wrote:
| I'm not old enough to remember Gopher being "the internet" but I
| have browsed a few retro sites that still run it. I wouldn't mind
| seeing some slightly upgraded gopher-like protocol that allowed
| for embedding images and maybe form submissions (without any
| scripting). Most of what I want to do online is read, and I'd be
| more than happy for everything to come with a standardized look
| and feel rather than whatever scroll jacking weirdo design every
| website feels like having.
| JorgeGT wrote:
| While this may be extreme, I do notice that it is becoming harder
| and harder to print webpages to PDF/paper. Is there a good
| approach for this besides the standard print dialog?
| kuu wrote:
| Maybe use the read mode of Firefox and then print it?
| bigyikes wrote:
| For sites without print-specific media queries (so basically
| all websites) I use dev tools to delete all the DOM nodes I
| don't want to appear in print.
| pharke wrote:
| Isn't this what IPFS is for?
| knownjorbist wrote:
| I'm surprised that IPFS and others aren't mentioned more here.
| The solution is staring us in the face, it's related to
| cryptography.
| throwawayswede wrote:
| While I appreciate the sentiment, I don't think PDF is the way,
| at least in the way you're currently doing it. PDF maybe
| supported by browsers, but they're not intended for it, it's
| secondary feature. Same for search engines. Same for mobile.
|
| Most browsers have Print to PDF. If you want people to be able to
| download an immutable version of your content, then just have a
| simple static version of your page with a valid print css, better
| yet, leave everything default.
|
| If you want to fight churn with PDF, just have a simple HTML
| website with a link to download a versioned PDF of your issue.
| Your website can be as simple as
| https://motherfuckingwebsite.com/ or
| https://bettermotherfuckingwebsite.com.
| grumblenum wrote:
| There are also other lightweight alternatives. The Gopher
| protocol has a small, but disturbed following :
| http://gopher.muffinlabs.com/gopher.floodgap.com (you can
| actually use netcat as your gopher client). Gemini is a more
| modern gopher-inspired protocol
| https://gemini.circumlunar.space/. Personally, I'd be pleased
| to see a text-first approach gain adoption. I don't think
| anyone looks at the thick-client model browsers have evolved
| into and sees an optimal solution.
|
| I think evangelistic energy should probably be directed at
| complaining to organizations that share content through JS-
| framework monstrosities. Getting rank-and-file web-devs excited
| about lean websites doesn't hurt, but clients and CTOs have
| real decision making power.
| csomar wrote:
| It's ironical that the author is pitching for PDF, and yet he is
| using a plethora of hyper-links.
|
| The big "invention" of the Web was linking pages together. That's
| what made it great. That's what created "Google" in the first
| place. Links in a PDF are supposed to take you to a browser or
| open a different PDF file?
|
| PDF is a step back. If you are angry about the overblown size of
| JavaScript and resources consumption, use a simple static
| website. It doesn't get easier than that.
| alisonkisk wrote:
| You're conflating browsers with markup language. Clicking an
| HTML link opens a different HTML file.
| PaulHoule wrote:
| PDF has quite the attack surface. It supports Javascript, 3D
| models, JBIG2 compression that turns 8's into 6's and all sorts
| of strange things.
| api wrote:
| The point about the size of the W3C spec is hilarious, but I
| wonder how much of that hundred million plus words is actually
| necessary to implement the parts of the spec that people use?
|
| Surely it would be possible to create a spec that captured the
| most useful subset of HTML and CSS functionality.
|
| In any case if the spec really is that huge the W3C should be
| written off. Any organization that produces a spec like that is
| worthless.
| dalbasal wrote:
| I found _" PDFs are files"_ kind of compelling. Perhaps this was
| a flaw of the original www concept. Web pages were always
| technically files & documents, but this was always abstracted
| away from userland. "Save webpage" was never a core feature. This
| did disempower users.
|
| PDFs are downloaded, saved, emailed around. They can also be
| linked to. Userland maintains a closer relationship with what's
| going on. A typical user know that you can have a copy of a file,
| which may or may not be identical to the online one. WWW, from
| its initial version, was mysterious. The transition between the
| model of requesting files from a server by clicking a link to a
| programmatically generated stream of code executed on your
| browser happened below typical users perspective.
|
| The wb has obviously gained a lot, but has also lost something.
| BenjiWiebe wrote:
| I've definitely used saved webpages a lot. When we had dialup
| email only, my dad would drive to the library with a flash
| drive and download Web pages to bring home and read. It was
| great. Of course, it's even greater now that I can load it
| fresh even faster.
| Aeolun wrote:
| All of the stuff he says PDF is, is the same for HTML.
| zeusk wrote:
| Well, sort of. Can't HTML contain script tags with external
| references (xmlHttpRequest or any async fetch) that a simple
| crawler/browser may not save to disk?
| wffurr wrote:
| It can, but you don't have to. It's absolutely possible to
| write self contained html files.
| wccrawford wrote:
| They could, but if he's the one create the file, he can
| choose. And if he's just hosting the file, I'm sure there are
| tools that will inline all the external resources.
| vimy wrote:
| Reading PDFs on a phone isn't an enjoyable experience.
| deregulateMed wrote:
| For books, I prefer it to Libby and Google Books.
|
| There are tons of pdf viewers to choose from, so if you don't
| like an App, there are more available.
|
| I like that mine remembers the last opened doc and page. I can
| copy text from pdf too.
|
| Although this isn't a comparison of ebook to pdf, it's html to
| pdf.
| ColinWright wrote:
| For reference, the original title was: We are
| drowning in churn and noise. I am fighting by switching
| this site to PDF
|
| I find the "actual" title unhelpful, unenlightening,
| uninformative, and uninviting, which I why I originally chose
| text taken directly from the page, so people would know what it
| was about before taking the time to click and read.
|
| I know _why_ the HN mods have changed it to "Deurbanising the
| Web", but I wish they'd keep more informative titles, especially
| when taken from the article in question.
| failwhaleshark wrote:
| I didn't understand what it meant. I thought it was a euphemism
| for gentrification. None of their knee-jerk, dilettante, low-
| effort rant clearly identifies what they're really mad about,
| or if they're just mad to be mad.
|
| It feels like a waste of everyone's time.
| dahfizz wrote:
| If this catches on, there will be "JS in PDF" in no time.
| MawKKe wrote:
| as in "it exists already"?
| [deleted]
| pseingatl wrote:
| Most people think that pdf's have to be letter or A4 size, but
| you can make them at A7 or A8 for a phone screen, or for that
| matter, any size you want.
|
| PDF is size-agnostic. There's nothing to stop you from creating
| documents the size of a phone screen. So you could put the phone
| screen-sized pdf at m.mysite.com and this small screen illegible
| complaint is solved.
| dredmorbius wrote:
| The site would be inspired to automatically detect device sizes
| (JS or CSS media queries) and offer an appropriately-scaled PDF
| download option.
|
| Unfortunatly it didn't opt for that.
| SMAAART wrote:
| Well, that's innovative.
|
| but, why not HTML 2?
| emptyfile wrote:
| Instead of writing text let me make some more noise by shoving
| PDFs for no reason.
| schipplock wrote:
| The text is too small to read on my phone. I can zoom in, but
| then I have to scroll horizontally. I'm afraid this website isn't
| targetting me.
| rerx wrote:
| When I click on the submitted link with Chrome on Android, it
| asks me if I want to redownload "0.pdf". Such a confusing
| question. If I pick the wrong answer, I end up with some
| restaurant menu I must have looked at months ago, not what the
| global poster intended.
|
| So for non-confusing real-world UX I'd recommend extra care with
| file names if you want to go PDF only.
| pornel wrote:
| Maybe the author doesn't realize how difficult PDF is to work
| with. In PDF it's ambiguous whether any two spans of text belong
| together in the same sentence or paragraph. It can even be
| unclear where are _spaces_ between words. PDF also allows
| "optimizing" font usage that makes text unreadable without OCR-
| ing the custom font. The messy hacks go on and on:
|
| https://filingdb.com/b/pdf-text-extraction
|
| OTOH it's totally possible to make a self-contained HTML page
| without using a JS framework of the day. It's going to be way
| easier to consume than a PDF.
| Recursing wrote:
| From the PDF
|
| > "But it's just as easy to write self-contained HTML pages!"
|
| > Sure, but if you're going to hide CTF forensics challenges in
| your publication, a coverdisk allows you to do it in style!
|
| I think it's not meant to be taken extremely seriously
| jl6 wrote:
| Hello. Original author here.
|
| I do realize how ugly PDFs are to work with (I wrote my own
| PDF/A generator for issue 2[2]). This is a Tagged PDF though,
| so you can extract text using standard tools.
|
| To understand the mindset, have a read of the Gemini FAQ[0],
| specifically the answer to why not use a subset of HTML - and
| then read Issue 2[2] which is a hybrid Gemini+PDF polyglot, for
| people who don't like reading PDFs, which is apparently
| everyone on this thread :)
|
| Issue 1[1] also moves beyond PDF, to try addressing some of the
| accessibility shortcomings by (a) prepending the content as
| plain text, and (b) _recording myself reading the whole thing
| out_ and arranging the file as a polyglot MP3 and PDF file that
| can be played in an audio player as well as viewed in a PDF
| reader as well as a text editor.
|
| A mini-FAQ to address some points elsewhere in the thread:
|
| * No, it's not going to replace your blog or the web in
| general.
|
| * Yes, it's an experimental art project / longitudinal CTF
| forensics tournament / weirdo personal blog.
|
| * Yes, I'm serious anyway.
|
| [0] https://gemini.circumlunar.space/docs/faq.gmi
|
| [1] https://lab6.com/1
|
| [2] https://lab6.com/2
| ReactiveJelly wrote:
| > The problem is that deciding upon a strictly limited subset
| of HTTP and HTML, slapping a label on it and calling it a day
| would do almost nothing to create a clearly demarcated space
| where people can go to consume _only_ that kind of content in
| _only_ that kind of way. It 's impossible to know in advance
| whether what's on the other side of a https:// URL will be
| within the subset or outside it. It's very tedious to verify
| that a website claiming to use only the subset actually does,
| as many of the features we want to avoid are invisible (but
| not harmless!) to the user
|
| But I don't really know that your PDF website doesn't use
| some evil invisible PDF feature.
|
| And I have to use a special Gemini browser to access Gemini
| pages. (Since an HTTPS bridge misses the point)
|
| So why not use Dillo as my "Sane subset of HTML"? It is not
| hard to hand-write HTML that looks great in Lynx, Dillo, and
| Firefox.
| prox wrote:
| I think the idea of PDFs opens up many new possibilities, and
| your work is quite an eye opener. Design is largely missing
| from websites - it's the same design over and over when it
| comes to optimizing for clicks.
|
| Designers would thrive in a PDF environment instead of
| handing their designs over to implementation as it is now.
|
| Maybe PDF is just the beginning and maybe a similar format
| can be thought up that addresses some of the concerns
| expressed here, and move over in time.
| sosuke wrote:
| I've spent entirely too much time "printing" sites and
| articles to PDF to save them to read or reference later. Your
| PDF style was perfect! No need to fuss with anything just
| save it!
| slashdot2008 wrote:
| This thread might be helpful to you
| https://news.ycombinator.com/item?id=27817659
| jrochkind1 wrote:
| I don't like reading PDFs and probably wouldn't read much of
| your website like that... but I appreciate the intervention
| drawing our attention to the advantages of PDFs in the
| disadvantaged present environment, which I think are real and
| worth thinking about. It seems almost like an artistic
| project. I'm not mad at you, and am not sure what makes some
| people seem to be so mad here (probably means you were
| succesful at something)... but I'm still not gonna read it,
| PDFs are a mess to read!
| morsch wrote:
| Case in point: copy-pasting a paragraph from his PDF-website
| adds line breaks everywhere. It also loses formatting
| (bold/italics) and the footnote superscript doesn't translate.
| PDF is an open standard, which is freely available2, and
| stable. It has a version number and many interoperable
| implementations including free and open source readers
| and editors.
|
| I think ease of copy-pasting is one of the coolest things about
| the document-centric roots of the web (along with the back
| button and hyperlinks; in other words, hypertext rules),
| although the modern web does break it (along with the back
| button and hyperlinks) in many places, so I can see where he is
| coming from. PDFs aren't the answer, though.
| signal11 wrote:
| > it's totally possible to make a self-contained HTML page
| without using a JS framework of the day. It's going to be way
| easier to consume than a PDF.
|
| Completely agree. For instance, NASA's APOD site[1] is a good
| example of something that'd be nontrivial using both an offline
| PDF and modern lightweight alternatives like Gemini, but works
| really well even without fancy modern design. Under 300kB
| including the image (HTML's under 6 kB) _before_ gzipping.
|
| [1] https://apod.nasa.gov/apod/astropix.html
| lifthrasiir wrote:
| > OTOH it's totally possible to make a self-contained HTML page
| without using a JS framework of the day.
|
| I'm basically in agreement, but the author has a good point
| that PDF is obviously self-contained and self-contained HTML
| pages are not necessarily distinguishable from those that
| aren't. Perhaps we might have to revisit MHTML or embrace Web
| bundles as an alternative to PDF.
| IshKebab wrote:
| Or... AMP? But no, Google made that so it must be a bad idea.
| dtech wrote:
| It's not even JS. I'd argue a HTML + inline JS page is a lot
| more self-contained than one with external images, videos and
| fonts.
|
| Note that PDFs can contain JS too.
| Digit-Al wrote:
| > Note that PDFs can contain JS too.
|
| That's why he says to use PDF/A, which can't contain JS.
| imglorp wrote:
| > Note that PDFs can contain JS too.
|
| Wait, why?!? When does it render? Who's supposed to have a
| js engine to do that? What version? How does it load
| dependencies? Is HTML and DOM carried along with it? So
| many questions.
| maskros wrote:
| Why? To validate form fields.
|
| Who? The PDF viewer.
|
| When? Since about 2000 in PDF format version 1.3.
|
| Dependencies? Hah, no such luck. You're stuck with ES5
| and Adobe's crufty JS library. There is no HTML and DOM,
| there are however some pretty thorough PDF document
| bindings.
| native_samples wrote:
| Why - because scripting is useful. A big use of PDFs is
| translating paper forms into digital forms without
| needing to make a web app out of them. JS is used for
| client side validation, same reason it was put into
| browsers. Acrobat can handle this along with _many_ other
| features that most PDF readers can 't handle properly.
|
| Basically in the PDF world, Acrobat Reader is Chrome and
| everything else is, like, Konqueror or something. Don't
| be fooled into thinking PDF is a small spec. It's not.
| cxr wrote:
| You want PWP <https://blog.jonudell.net/2016/10/15/from-pdf-
| to-pwp-a-visio...> (Later aborted, and the group's work was
| rolled into EPUB3. As you note, there remains a genuine need
| for it.)
|
| On the other hand, there's nothing stopping you from using a
| double-barrelled file extension for denoting this sort of
| thing, e.g. "memex-opus.pub.html"; so long as it ends with
| something recognizable, double-clicking should still open it
| in the browser across all the usual platforms, AFAIK.
|
| (I'm fond of using "xyzzy.app.htm" myself to take advantage
| of this trick for distributing simple, self-contained
| programs that are designed run in the browser.)
| 7sidedmarble wrote:
| This is what PWAs are kind of for.
| dalbasal wrote:
| The author addresses this: " _We choose to switch to PDF in
| this decade, not because it easy, but because it is hard" -
| John F. Warnock, September 12th 1962_ "
|
| The author is obviously making a statements, exploring ideas...
| not searching for an actual solution to his use case.
| jl6 wrote:
| Yeah, it's kinda embarrassing that the one quote that gets
| pulled out in the HN commentary is the one that contains a
| typo. It's OK: Issue 1[0] contains a patch to fix the issue.
|
| [0] https://www.lab6.com/1
| rkachowski wrote:
| Is this a comical misquote or is the PDF format actually 60
| years old?
| wccrawford wrote:
| It's a comical, deliberate misquote.
| SethMurphy wrote:
| Comical misquote, "Switch to PDF" replaced "Go to the
| Moon".
| dalbasal wrote:
| Its comical, but links to the founder of Adobe. IDK what
| the date alludes to.
| fmajid wrote:
| JFK announcing the US would put a man on the Moon before
| the decade was over.
| dalbasal wrote:
| oh... yeah
| coldtea wrote:
| It's about 30 years old - it's creator however is said
| person.
|
| The actual quote was from JFK iirc regarding the Apollo
| missions...
| Ajedi32 wrote:
| This is an awful idea and I love it.
|
| As others have pointed out it's strictly worse than a static HTML
| site in many, many ways. At the same time though, it's a
| brilliant criticism of many of the worst aspects of the modern
| web.
|
| This is art.
| ussrlongbow wrote:
| Very surprised to see just few comments mentioning EPUB, which is
| IMO is much more suitable for document-centric approach. An open
| standard with freely available[1] specification and never had any
| problems with EPUBs on PC, tablets and phones.
|
| [1] - https://www.w3.org/publishing/epub32/epub-spec.html#sec-
| intr...
| kccqzy wrote:
| But can you open an epub in a browser? That's the main point
| here.
| spinax wrote:
| Not only simple browser plugins per the other reply (and a
| plethora of non-crashing mobile apps, whereas mobile PDF
| reading apps crash on me all the time) - the ePub format is
| just a zip file in disguise with plain text (HTML) inside and
| maybe some images/etc.
|
| In a manner of speaking, ePub as a design has an inherent
| built-in fallback mechanism to manually obtain the internal
| content in case of failure - including ability to try and
| repair a broken zip format (zip -F/-FF) and grep it in place
| (zipgrep).
| ussrlongbow wrote:
| Yes, but with the plugin
| shuntress wrote:
| Also worth pointing out, EPUBs are (or, at least, can be. I'm
| not sure how much flexibility is in the specifications)
| basically just bundled HTML.
| DerDangDerDang wrote:
| There's a fixed layout version of the ePub standard too,
| allowing PDF quality if that's what you're after.
| bambax wrote:
| He should offer PDF _in addition to_ basic HTML, not as a
| replacement.
| keiferski wrote:
| I think if one designed a "crisis-proof" version of the web, it
| might end up being a network of PDFs. My reasoning being:
|
| - PDFs are universally understood by most people and can be read
| on phones, desktops, laptops, and eBook readers.
|
| - Once you've downloaded a local PDF version of the site, there
| is no risk that it can be changed or removed by the host.
|
| - File size is predictable ahead of time, which is useful if your
| connection is limited or slow.
|
| - PDFs are designed for printing (moreso than most sites) which
| may be useful in situations where electricity is in low supply.
| lmm wrote:
| I set up my blog so that the page source would consist of the
| original markdown and as little markup as possible to make that
| render. You can read it with telnet and the experience isn't so
| much worse than using a browser.
|
| (The actual part that makes this work is a pile of opaque
| javascript doing all sorts of nasty things at runtime, but such
| is the way of web pages in today's browsers, I don't worry too
| much about it).
| Tajnymag wrote:
| Except the printing part, all can be said about a standalone
| html file.
|
| External content, like images, can be inlined, thus you would
| only have to distribute one single .html file.
|
| I'm not sure how would file2file linking work in the realm of
| pdf files. With html files, it's easy even without any web
| server.
|
| Plus, html can be even digested through a terminal interface.
| That cannot be said about the binary nature of pdf documents.
| nulbyte wrote:
| I use a terminal pager with PDFs quite frequently. It works
| surprisingly well. Even something you wouldn't expect, like a
| pay stub, renders fine in the terminal.
| bashinator wrote:
| What do you use to display the pdf? Pandoc?
| keiferski wrote:
| This is true, but I do think a PDF is just conceptually
| simpler and requires less technical knowledge. Especially in
| a situation where technical users are scarce.
|
| IMO most people have a mental model of a PDF as being a
| digital document, whereas a HTML file is somewhat more
| amorphous.
| npteljes wrote:
| Not sure about your points. Contrast it with a static HTML+CSS
| website:
|
| - PDFs require a reader, HTML a browser. I wouldn't argue that
| there are more PDF readers installed than browsers.
|
| - Downloaded static HTML works the same
|
| - File size can be included in the HTTP response: in the
| Content-Length header
|
| - Printing is nice, but reflowable text is even nicer, since we
| target a multitude of rendering targets.
| ChrisMarshallNY wrote:
| PDFs are...not so easy to generate dynamically.
|
| I have done it with a couple of PHP libraries (fpdf and mpdf),
| but they are primitive, compared to desktop PDF generators. I
| know that you can use Java (never done that), or
| even...ugh...XSL (also never done that).
| dredmorbius wrote:
| Most desktop operating systems offer a print-to-PDF
| functionality. It's long been an add-on for Microsoft, but
| that's really a historical accident / deliberate choice of
| that platform.
|
| PDFs can be trivially created from Markdown or using LaTeX
| templates if you're looking for a programmatic solution.
| Pandox and XeLaTex are helpful, the poppler libraries as
| well. Again, these are generally and widely available at no
| charge.
| lucideer wrote:
| > _PDFs are universally understood by most people and can be
| read on phones, desktops, laptops, and eBook readers._
|
| PDFs need a proprietary app to use, most of which are loaded
| with spyware & trackers. I may be mistaken in this but
| MacOS/iOS are the only OSes I know of that read them natively?
| There's absolutely nothing universal about the format.
|
| HTML is truly universal: not only does every OS come with a
| built in HTML viewer, but it's a plain text file. You can read
| the source using anything.
|
| > _Once you've downloaded a local PDF version of the site,
| there is no risk that it can be changed or removed by the
| host._
|
| Once you've downloaded a local HTML version of the page there's
| no risk that it can be changed or removed by the host. Yes,
| there's caveats to both: people can create PDFs with remote
| embeds or HTML sites with ajax content but both of these are
| the fault/responsibility of the individual author. It's as easy
| to make good downloadable HTML as downloadable PDF.
|
| The so called "churn" is the responsibility of the individual
| HTML author. If you're making bad HTML, the fix is to start
| making good HTML. Not to switch to a closed inaccessible
| format.
| npteljes wrote:
| PDF is an open format, with multiple FOSS reader
| implementations. You could argue that a subset of niche
| features can only be used in Acrobat Reader, but AR is far
| from the only PDF reader out there.
|
| And the churn is part of the zeitgeist, not really a
| responsibility of anyone in particular. Individuals are
| suckered into it, companies are supplying it, and governments
| are allowing it. We're all part of it. Not new either: I'm
| hearing it since the 90s how the modern life is rushed, and
| that's just my limited experience.
| lucideer wrote:
| I said it wasn't universal, which is somewhat different to
| the vague idea of being "open", and yes, PDF is technically
| an "open format" depending on how you define "open". The
| ISO 32000 spec. costs in the region of ~200 USD/EUR.
|
| What that "openness" translates into in the real world is
| that there are zero non-Adobe _viewers_ that support all of
| PDF 's features, and even less PDF editors. The standard
| PDF editor costs ~200 USD/EUR (annual subscription).
|
| This is before we even get into the nightmarish world of
| PDF _parsing_. Or PDF accessibility.
|
| PDF is a great format if you're sending a document to
| someone for them to print immediately. It has no other
| valid uses imo.
| npteljes wrote:
| I see your point, thanks for the elaboration. I'm not a
| fan of the format either.
| foxes wrote:
| How is a complicated binary better than a literal text file?
|
| Truly absurd, this whole thread is churn.
| mark_l_watson wrote:
| I also enjoyed the sentiment of the article. I used to blog a lot
| but in the last decade I have preferred more long form writing.
| Now I use the leanpub.com [1] service so when I write, I get
| generated PDF/ePub/Kindle formats, and material is readable
| online as HTML/CSS. For me leanpub is a way to make content free
| and accessible, but people can pay if they want. The relatively
| few people who pay for my material have a large effect on what I
| decide to write about in the future or which writing projects to
| drop.
|
| I consume the web mostly by following a few very interesting
| people on social media and following their links. As an author,
| my goal is to keep producing interesting enough material to be
| worth people's time reading.
|
| [1] https://leanpub.com/u/markwatson
| leephillips wrote:
| We already have a wildly popular website where all the main
| content is in the form of PDFs. It's https://arxiv.org/. PDF is
| what you use when your document needs to have a predictable
| layout. This is especially important if it contains math, complex
| tables, or any elements where meaning is carried by positioning
| on the page. This can include aesthetic meaning, as in some forms
| of poetry that need to be laid out in a particular way.
| dredmorbius wrote:
| There are several which at least strongly resemble that remark.
|
| Project Gutenberg and the Internet Archive's text archives
| (along with numerous other document-oriented sites, several of
| the samizdat variety) offer content in PDF and other document-
| oriented offline downloadable forats.
|
| Wikipedia has a "save to PDF" link on each article (that seems
| to work through the browser's capabilities, if any, not all
| browsers support this). The sister Mediawiki site Wikisource
| offers ePub downloads.
|
| For longer-form content, PDF, DJVU, and a handful of other
| formats (arguably ePub) are at least reasonably popular.
| specialist wrote:
| OC page three:
|
| > _PDFs used to be unreadable on small screens, but now you can
| reflow them._
|
| Off hand, which PDF viewers do reflow?
| dredmorbius wrote:
| Foxit, PocketBook Reader, FBReader. Presumably Adobe Acrobat
| though I've not touched that in a decade or more.
|
| There are also utlities such as the poppler library's pdftotext
| which will dump ASCII / bare text from at least some PDFs.
| snksnk wrote:
| Why not use TeX/LaTeX instead and also include a link to the
| code?
| jimhefferon wrote:
| The LaTeX below will leave a push-pin symbol in the text, and
| clicking on it shows the code.
| \documentclass{article} \usepackage{attachfile}
| \usepackage{lipsum} \begin{document}\expandafter\att
| achfile\expandafter{\jobname.tex} \lipsum[1-150]
| \end{document}
| leephillips wrote:
| Using xelatex, I got only the text, no pushbutton. Using
| pdflatex, I got a pushbutton, but it was not a hyperlink,
| just an image. What engine do you use to get this to work?
| jimhefferon wrote:
| I ran pdflatex from a 2017 TeX Live install under Ubuntu,
| and viewed in Acrobat Reader.
| leephillips wrote:
| Ahh, I think it is a viewer problem. Sadly, most viewers
| can not handle PDF attachments properly or at all.
| clearing wrote:
| I honestly can't believe all the praise for HTML and web on HN in
| the face of this awesome critique. I hugely appreciate the love
| for actual files.
|
| >* PDFs are decentralised. You may have obtained this PDF from a
| website, or maybe not! Self-contained static files are
| liberating! They stand alone and are not dependent on being
| hosted by any particular web server or under any particular
| domain. You can publish to one host or to a thousand hosts or to
| none, and it maintains its identity through content-addressing,
| not through the blessing of a distributor.
|
| This seems to have gotten lost in the offense everyone has taken
| over the choice to not use 'simple HTML', despite the document's
| clear reasoning that to do even that would embed the content deep
| in the 'urban web'. All of these simple-complex propositions
| about making some subset language or automating document flows
| are missing the point entirely.
| danShumway wrote:
| > You can publish to one host or to a thousand hosts or to
| none, and it maintains its identity through content-addressing,
| not through the blessing of a distributor.
|
| It kind of seems like you're describing IPFS, except with worse
| content addressing guarantees. The vast majority of your users
| will never check to see if a PDF's content actually match its
| content address.
|
| > All of these simple-complex propositions about making some
| subset language or automating document flows are missing the
| point entirely.
|
| Are they? It's really not that hard to build a self-contained
| HTML file, and to re-emphasize, signed PDFs and signed HTML
| files are about the same level of accessibility to most users.
| Web browsers don't really handle either, if you want those
| guarantees you need to use a protocol/technology with better
| support right from the start.
|
| Also to be clear, despite the author's argument that PDFs _can_
| be self-contained, no browser guarantees that, and there 's no
| way for me to tell if the PDF is self contained when I click on
| it in Firefox unless I download it and check it myself offline
| or in a viewer that guarantees it won't make network requests.
|
| Nothing online that I'm aware of forces authors to use PDF/A,
| so when I download a PDF, I _don 't_ know what I'm getting.
| It's not actually the magical, re-hostable world that the
| author claims.
|
| I'm not sure that people are missing the author's point so much
| as they're saying the author is making claims about the
| portability of PDFs that aren't necessarily accurate. Yes, it
| would be good to have better self-contained guarantees about
| some web-content, but I'm not sure PDFs actually supply any of
| those guarantees.
| chrismorgan wrote:
| One previous discussion in comments:
| https://news.ycombinator.com/item?id=24257982
|
| For my part, I expressed bafflement because the end result seems
| worse than the starting point in almost every way, _including_
| those that the author was complaining about the web for.
|
| (There are a couple of others to be found in
| https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...,
| but not so substantial.)
| [deleted]
| danShumway wrote:
| I guess by modern standards this load time is acceptable, but
| when you argue that PDFs are a way to move forward, you're
| competing with HTML 4/5. And by that standard:
|
| - Crud this website is so _slow_. Unacceptably slow. If your
| technology stack is spending 10 seconds just to fetch and render
| 13 pages of large-screen text, then either you 're doing
| something wrong or it's a bad technology stack. That load time
| alone should kill this idea.
|
| - There's no way for me to turn off images. This is the opposite
| of a client-respecting webpage, the only way you could make it
| worse is by rendering to Canvas or shipping me a PNG. My mobile
| browser doesn't fetch fonts by default. You're overriding my
| choice to do that.
|
| - Mobile? Reflow? Responsive design? Adjustable font sizes? The
| author kind of offhandedly says that PDFs can do reflow right
| now, but how many clients actually support that. Does the PDF
| format handle this by default?
|
| - Saying "you can technically make PDF accessible" is exactly the
| same as saying "you can technically use just a subset of HTML."
| It's the same argument. Nobody does it, PDFs are generally
| hostile to accessibility, and there's no way to signal that a PDF
| is accessible or enforce it as a community standard.
|
| So, the much bigger question: what's wrong with Gemini[0]? I've
| been critical of Gemini in the past on multiple fronts, but if
| you are in this space where you want to burn everything down and
| make your blog static, Gemini really does seem to solve every
| problem that the author has, except better. It's also trivial to
| proxy Gemini documents or statically re-render them to HTML,
| which makes them accessible to people outside the community. And
| by default, they're both pretty accessible to screen readers, and
| much more efficient than what the author is proposing.
|
| The author argues that using static HTML wouldn't be good enough
| because there's no standard that forces you to exclude
| Javascript. Then they point to PDF/A, which is not a standard
| that is enforced by most browser PDF viewers. To me, this
| argument isn't any different from telling website authors to
| choose not to use Javascript, what is going to force anyone to
| use PDF/A? Every web browser PDF reader supports Javascript.
| NoScript support in Firefox is better than the
| controls/extensions for disabling PDF scripting.
|
| And Gemini is _right there_ : for the most part it's actually
| working today. So I just don't get it. Why pick a technology
| that's tangibly worse than the web on (and I mean this quite
| literally) almost every single axis and every single metric, when
| you could instead switch to a markup language that actually does
| have use-cases, that does simplify deployment and blogging in
| some instances, that does have a real community, that does have
| some real advantages over HTML, that does have some real momentum
| behind it, and that doesn't disrespect my choices about what
| fonts/images I want to download?
|
| [0]: https://gemini.circumlunar.space/
| lucian1900 wrote:
| Sounds to me like ePub would fit better. It's designed for reflow
| and it's built out of a subset of HTML. Worse case the contents
| of the file can be expanded.
| leephillips wrote:
| There are good points here, but I think the author slightly
| undermines his message because the layout and typography of this
| particular PDF is so poor. Probably because it "was written in
| the world's greatest web authoring tool: LibreOffice Writer".
|
| In other words, one advantage of PDF is that free authoring tools
| such as the TeX family can create typographically beautiful
| results that are nearly impossible to achieve with HTML, but he
| leaves that on the table.
| petercooper wrote:
| In a sea of cynicism, I gotta say.. bravo. This genuinely put a
| smile on my face. It has a lot of problems, sure, but it's a
| creative use of the Web and would surely work for _some_ use
| cases. It 's certainly no worse than using Flash ever was.
|
| It reminds me a bit of a "newsletter" I'm subscribed to called,
| ironically, "Not a Newsletter" (http://notanewsletter.com/). You
| get an email from the author each month and it just points to a
| Google Doc where he puts the actual content. Why's this good? The
| content can't set off any spam filters, he can edit the issue
| after it's "sent" if there are mistakes or broken links..
| sneak wrote:
| The content can be censored arbitrarily by google, and when you
| click on mobile web with the docs app installed, it logs your
| logged in google account identity (maybe for work?) with the
| view when it switches to the app.
|
| Files have none of these problems.
| indigochill wrote:
| If the author was concerned about getting censored by Google
| or feeding their data empire, they could set up a self-hosted
| Google Docs-like, like NextCloud.
|
| The readers would still need to trust the author's not doing
| anything nefarious with their IP addresses, but I guess
| there's a degree of implicit trust when subscribing to a
| newsletter.
| noduerme wrote:
| I would just put it on my own server. Are people really
| worried about clicking a private link and having their IP
| address logged? Just opening an email with a tracking pixel
| triggers that already, and you have to assume clicking a
| link will log your IP whether with Google or Constant
| Contact or any other mass email provider.
| nonameiguess wrote:
| Google Docs are still files. It's just up to the author (or
| even the readers) to keep copies outside of Google's servers.
| Unless Lab6 owns their own servers, whoever is hosting these
| pdfs can delete them as well. At least, in both cases, static
| files are much easier to backup and copy than entire three-
| tier dynamic applications. And readers can keep their own
| copies separate from the original, which isn't possible with
| an application at all.
| lmm wrote:
| > Google Docs are still files. It's just up to the author
| (or even the readers) to keep copies outside of Google's
| servers.
|
| No they're not? You literally can't have a google doc as a
| file in a first-class way - you can export it to a file,
| but that's a lossy process.
| noduerme wrote:
| Yup. Another way to say it is Google will release a file
| format the day offline computing drops dead. It should
| probably amount to an antitrust case or at least a major
| class action claim at this point. That said, even _with_
| PDF specs it 's freakin impossible to read/write that
| format in an intelligible way, if the person creating the
| document used even the barest amount of block alignment.
| Adobe started with an innovative notion about layout, but
| ended up making content extremely hard to parse, and
| actually tried to open source the engine. Google started
| with an idea of trapping everyone's data in a format
| they'd never make fully available, and then charging for
| the privilege of storing it.
| petercooper wrote:
| You're not wrong! It always a trade off of one set of
| problems for another with these sorts of things, I guess.
| midrus wrote:
| LOL
| ok123456 wrote:
| Jekyll plugin that produces a pdf version of each page?
| mattnewton wrote:
| I can't tell if this is satire or not, because reading it on my
| phone hurt my eyes after the first couple pages.
|
| Please use EPub if you are after an open format or freeze web
| pages into an offline-able format and don't use PDF.
| ccorcos wrote:
| "Files are a basic human freedom" - that definitely resonates
| with me.
|
| There's an assortment of trade-offs though. In particular,
| linking between files breaks if you ever want to move or rename a
| file. Also, by self-encapsulating every file, you end up using
| space less efficiently.
___________________________________________________________________
(page generated 2021-07-19 23:01 UTC)