[HN Gopher] Deurbanising the Web [pdf]
       ___________________________________________________________________
        
       Deurbanising the Web [pdf]
        
       Author : ColinWright
       Score  : 453 points
       Date   : 2021-07-19 10:22 UTC (12 hours ago)
        
 (HTM) web link (lab6.com)
 (TXT) w3m dump (lab6.com)
        
       | everyone wrote:
       | "with no external dependencies to manage."
       | 
       | Except for like, the software which reads and renders the pdf,
       | which may not be available on current or future OSes.
       | 
       | I dont see how this is any different to writing a static webpage.
        
       | temporallobe wrote:
       | Why not just extremely simple, plain HTML? No frameworks, not
       | even CSS. In fact, you could make your life even simpler by using
       | markdown files and having the browser convert that to HTML in
       | real time with a single JS library (there are a few, I am not
       | promoting anything one particular), so it doesn't even require a
       | "back end"! Plain HTML, while not having all the "portable"
       | attributes of PDF, is still pretty darn robust and most browsers
       | handle printing (or conversion to PDF) quite well.
        
         | prox wrote:
         | I think it is because PDF is a document first, and HTML often
         | hard to save/file.
         | 
         | PDF is also able to create with design in mind, in a document
         | creation app, which after decades of HTML is still hard to do I
         | think.
        
           | BenjiWiebe wrote:
           | HTML isn't hard to save and file on a computer, and on phones
           | it seems everything is hard to save and file.
        
             | prox wrote:
             | You are right in a technical sense, but if I ask someone
             | who is a low level user to save a webpage, most don't know
             | how to do that.
             | 
             | It's not front and center or even encouraged! This makes a
             | big difference for adaption.
        
         | dredmorbius wrote:
         | Some of the listed benefits don't apply. Notably paginated
         | (PDF) vs. scrolled navigation, but also features such as
         | formulae displays and specific typesetting / layout elements,
         | in-page bookmarks, highlighting, and notes.
         | 
         | For shorter documents that's not much of a problem. For
         | anything much over ~chapter length (about 20 pages or 10,000
         | words), navigation within a single HTML page becomes painful.
         | Well below that level on smaller devices
        
       | aenigma wrote:
       | Great article - so much depth and accuracy to this! I see a lot
       | of discussion about the semantics of pdfs but I think those are
       | missing the overarching theme here.
       | 
       | Feels like this is more about the fact that websites have become
       | increasingly dynamic, unstable, unreliable, inconsistent, etc. -
       | pdfs offer something like a book, static, stable, reliable and
       | consistent.
       | 
       | Think about a book you can turn to a specific page no matter how
       | many times you look at it and the print is the same, the
       | information is the same, you can do the same action over and over
       | again and get the same expected result.
       | 
       | Now imagine opening a book and you could have sworn that the
       | chapter you wanted to reference was 11 but now it's 16 and the
       | images are different, the examples are different, in fact the
       | quote that you wanted to use for reference no longer exists in
       | the book.
       | 
       | There's an insanity to this experience but it's exactly what the
       | web is like - a book that is constantly changing, upended changed
       | - even disappearing entirely. I could have sworn I had bought
       | that book on discrete mathematics - how could it be gone? oh
       | that's right the server managing site is powered off - book no
       | longer even exists.
        
       | stayux wrote:
       | Thanks. I am starting self-hosted blog about design fundamentals,
       | best-practices, etc. Using only PDF is not a solution for me.
       | Combining minimalistic web-site design with pdf/e-pub will suit
       | me well. I like your approach as a statement against web
       | "pollution".
        
       | bmn__ wrote:
       | It is too early to displace HTML with PDF.
       | 
       | > PDFs used to be inaccessible
       | 
       | My eyes are not very good. I have trouble reading the font in the
       | PDF. I am using Firefox. HTML lets me pick that a font that I can
       | read easily. I cannot do that with PDF.
       | 
       | > PDFs used to be unreadable on small screens, but now you can
       | reflow them.
       | 
       | I am using Firefox. I cannot do that.
       | 
       | Realistically, how many years will I have to wait until Firefox
       | catches up?
       | 
       | Over twenty years ago, I learnt Web authoring by examining the
       | source which had a profound effect on my career. That
       | serendipitous opportunity I had with human-readable sources will
       | be lost to the next generation with PDF - they have to learn the
       | technology deliberately.
        
         | titzer wrote:
         | > Over twenty years ago, I learnt Web authoring by examining
         | the source
         | 
         | So did I. Now, it is impossible to reverse engineer the metric
         | crapton of minified JS and CSS cryptoglyphics that comprise the
         | modern web.
        
           | rollcat wrote:
           | TBH it's a little bit like complaining you can't open a
           | modern binary executable in a hex editor and learn
           | programming from that. Days of doing your regular coding by
           | writing direct machine code or assembly are (mostly) gone,
           | and for the sake of advancing the craft, I'm (mostly) happy
           | with it.
           | 
           | But I too wish the modern web was simpler. It took an
           | evolutionary path of maintaining just enough backwards
           | compatibility to only keep making things worse. Efforts like
           | Gemini[1] bring some hope but I'm afraid the medium won't be
           | flexible enough for much beyond personal blogs. But maybe
           | that's for the better.
           | 
           | [1]: https://gemini.circumlunar.space;
           | gemini://gemini.circumlunar.space
        
         | Santosh83 wrote:
         | As far as I know, it is nothing specific to Firefox. You can't
         | set your own PDF font or reflow a non-reflowable PDF in _any_
         | browser.
        
           | chrismorgan wrote:
           | Brief investigation suggests reflow is a super-clumsy, ultra-
           | coarse-grained view mode that is implemented by few clients,
           | is not easy to access, is not well known, and is _vastly_
           | inferior to what you can get on the web, especially as it's
           | basically text-only.
           | 
           | In Adobe Acrobat (and I'm _guessing_ Adobe Reader): Choose
           | View - Zoom - Reflow, and it turns everything into one column
           | of nigh-unformatted text.
           | 
           | (Word looks like it _may_ support it, but that could be more
           | that it's converted it to a Word document in some way and
           | reflow-like functionality falls out of that naturally, though
           | I imagine the tagging would help with the conversion; and
           | someone in this thread mentions something called "Book
           | Reader" supporting it.)
        
         | silon42 wrote:
         | >It is too early to displace HTML with PDF. 'Never' will be too
         | early.
         | 
         | >Realistically, how many years will I have to wait until
         | Firefox catches up?
         | 
         | They should better improve reflow for HTML on small devices
         | first. Focusing on PDF is a waste of resources.
        
           | zinekeller wrote:
           | I mean, Firefox just follows the website's command to not
           | format it as a mobile webpage, right? But a button to
           | forcibly reflow is handy though.
        
         | x86_64Ubuntu wrote:
         | Source code for websites hasn't been readable for years.
         | Reading a minimized JS document that has mauled the DOM is only
         | slightly more readable than the structure of a PDF.
        
         | simias wrote:
         | My understanding is that PDF is a monster of a document format,
         | and it's clearly not (usually and historically) meant to be
         | reflowed. Even copy/pasting from PDFs can be very disconcerting
         | because the viewer may not have a good idea of where blocks of
         | text start and end (or even what the characters really are).
         | 
         | I can empathize with the feeling that the web is incredibly
         | bloated, but that's IMO throwing the baby with the bath water.
         | Simple HTML with some optional CSS would do the job much better
         | IMO (and can be easily downloaded, mirrored or offlined with
         | tools like wget).
         | 
         | And if you really don't like writing HTML (I won't blame you)
         | then there's always formats like markdown, org-mode and friends
         | which can easily be converted to pretty much anything.
        
           | shuntress wrote:
           | Dealing with PDFs (as in, coding a system that can
           | import/export/display them) is more obnoxious than dealing
           | with excel spreadsheets.
           | 
           | Unless your system is a PDF library (as in, you make the
           | black-box dependency that other systems use to handle PDF
           | exports), everything you do with PDFs will be through some
           | annoying black-box dependency that is a pain to use.
           | 
           | Even relatively complex HTML is much more fun to work with
           | than PDF.
        
         | marcosdumay wrote:
         | The one piece of software that I know that lets you reflow PDFs
         | is Calibre. And the results aren't great.
        
         | qznc wrote:
         | At least it looks more beautiful than terminal-only Gemini
         | sites.
         | 
         | https://en.m.wikipedia.org/wiki/Gemini_%28protocol%29
        
           | II2II wrote:
           | Gemini sites are not terminal-only and the renderer can make
           | it look beautiful (depending upon one's definition of
           | beautiful). One example is Lagrange:
           | 
           | https://github.com/skyjake/lagrange
        
           | majewsky wrote:
           | Gemini is as "terminal-only" as Markdown. Just because it's a
           | text format first and foremost, does not mean that you can't
           | display it nicely formatted. It's more like EPUB in that
           | regard.
        
       | tonis2 wrote:
       | What a nice website, what framework is it built with ? Maybe
       | Vue.js or Angular.js or maybe Nuxt fuking js ?
        
       | leephillips wrote:
       | A related idea is making a website entirely from SVG. Here is a
       | lovely example: https://ozake.com/
        
       | sammalloy wrote:
       | One problem I noticed on mobile, is that if I click on a link in
       | the PDF and visit another page, and then try to traverse back, it
       | takes me to the first page in the PDF, rather than the page I
       | linked from.
        
       | marbu wrote:
       | I don't consider using pdf for this purpose a good idea. It would
       | be better to have a static html pages, with reference to epub
       | with the same content. One can have both generated from the same
       | source with a static site generator.
        
       | zabzonk wrote:
       | Sorry, I'm not a Web developer - what is meant by "churn and
       | noise" in this context?
        
       | westcort wrote:
       | While I agree with the thesis, I believe it it possible to do
       | things like this with vanilla HTML. For example, I created a
       | search engine that is just a static HTML page:
       | www.locserendipity.com
        
       | jedimastert wrote:
       | I find this to be a super interesting response. When I settled
       | into my current website design, I ended up basically writing an
       | article for the homepage. I'm not a designer by any stretch, and
       | it was the most attractive homepage I could make, and I still
       | really like it. I used a very similar workflow (and continue to
       | for articles) to the papers I wrote in college, and would really
       | only take one more step to get that to final pdf state.
       | 
       | I'm torn between leaning into the static nature of the site and
       | implementing the wiki I've been thinking about making
        
       | xvector wrote:
       | I think simple HTML + print to PDF (supported by default in most
       | browsers) is a much more elegant solution.
        
         | opsecweather wrote:
         | Run it through outline.com first to remove all the ad-sidebars.
        
       | cochne wrote:
       | As someone who works with PDFs a lot, please don't. PDFs are
       | awful in every case except those which require a very precise
       | visual layout. From reading the article, I do not see a single
       | case in which PDF is superior to vanilla HTML.
        
       | blacktriangle wrote:
       | "HTML's semantic capabilities were oversold."
       | 
       | THANK YOU! HTML semantics are a trap, just enough to make you
       | think something is there but anemic enough to be a giant
       | excersize in bikesheading. Ask yourself this: If HTML semantics
       | were adequate, why do we have ARIA and 90 different microformats?
       | 
       | Other than that, I read the article expecting to be annoyed by
       | the PDF presentation but was pleasantly surprised by how it read
       | just like I would want a content page to read. My only complaint
       | is that browsers (at least Brave) do not preserve scroll position
       | in PDFs. If the browsers fix that the author may be onto
       | something here.
        
       | X6S1x6Okd1st wrote:
       | "I'm mad as hell and I'm not gonna take it any more" but for
       | webtech.
       | 
       | It's totally unclear why they don't just use a subset
        
       | sbazerque wrote:
       | I like the idea of keeping HTML's document-centric original
       | design, but accessing the documents using p2p protocols (instead
       | of the client-server model used on the web).
       | 
       | I'm working on an open-source implementation of this idea at
       | https://www.hyperhyperspace.org
        
       | Santosh83 wrote:
       | Why not just publish static HTML with CSS only? It is, to my
       | mind, better and more accessible than either PDF or a Javascript
       | SPA.
        
         | TheCoelacanth wrote:
         | And if you bundle that HTML and CSS as an EPUB, it's just as
         | self-contained as a PDF.
        
       | rado wrote:
       | This is terrible for accessibility. Please just use semantic HTML
       | and your web will be usable on 10yo devices and unknown devices
       | 10 years in the future.
        
       | dvfjsdhgfv wrote:
       | I don't agree with author's choices (yes, I'm disciplined enough
       | not to add irrelevant elements to my content), but it's really
       | sad that things got to the point where someone actually suggests
       | PDF as an alternative to the web.
        
       | KEITH_PETERSON wrote:
       | I just opened your website on mobile and it's very user friendly,
       | I got to scroll in many directions to read the content.
       | 
       | We build our own website with gatsby and only use js if it's
       | really needed (when you click interactive links, we're still
       | trying to improve a bit. We customized Gatsby because doesn't
       | support this out of the box) that gets 100 score on mobile on
       | Google page speed: https://marxcommunications.com/
       | 
       | Proof: https://imgur.com/a/N4IJoEk
       | 
       | Or run it yourself:
       | https://developers.google.com/speed/pagespeed/insights/?url=...
       | 
       | It's possible but takes some work.
        
       | saltdoo wrote:
       | I'm on mobile and unable to open any links in this pdf after
       | opening with three different pdf viewing apps. :/
        
       | failwhaleshark wrote:
       | _Cut off your nose to spite your audience._
       | 
       | PDF is meant for viewing and printing books. It's not very good
       | for browsing and requires PDF viewers. All of the browser add-
       | ons, functionality, and behaviors are lost by forcing people to
       | use a PDF viewer.
       | 
       | HTML is meant primarily for browsing but it can also be used for
       | print media. CSS can specify paper sizes. If someone were so
       | worried about external media, they can host it themselves or roll
       | their own CDN. If they were so worried about fonts, they can
       | include them themselves.
       | 
       | It's more semantic web-compatible to describe a website with RDF
       | and have PDF, EPUB, DJVU, MOBI, TXT, PS, etc. links there and
       | also in the webpage. This is how you provide the most
       | accessibility. Furthermore, using a meta document language like
       | LaTeX or something XML that can transform into other document
       | artifact forms mechanically is the way to go.
        
       | maccard wrote:
       | I've always wondered why some sites can serve PDFs that my
       | browser (firefox) can view inline (my preferred method), rather
       | than forcing me to download the file and open in a separate
       | application
        
         | [deleted]
        
         | chrismorgan wrote:
         | It depends on the Content-Disposition header:
         | https://developer.mozilla.org/en-
         | US/docs/Web/HTTP/Headers/Co....
         | 
         | There are extensions that let you intercept this header, e.g.
         | https://addons.mozilla.org/en-GB/firefox/addon/no-pdf-downlo...
         | which per https://github.com/MorbZ/no-pdf-
         | download/blob/c924d657f33398... detects the content-type and if
         | it's PDFy replaces the content-disposition header with
         | "inline".
         | 
         | (Clicking on a link that has the download attribute set also
         | affects things: https://developer.mozilla.org/en-
         | US/docs/Web/API/HTMLAnchorE....)
        
       | noduerme wrote:
       | I read this entire document. If you've ever had to write a PDF-
       | to-text parser - and God help you, I have - you will beg for
       | Flash to come back as a web standard.
       | 
       | [edit] Generally though, I'm sympathetic with your point and it's
       | kind of like why zines regained popularity in the 90s (and
       | samizdat in the Soviet Union before that)... controlling your own
       | publishing is a powerful idea. Anyone can do that though, without
       | resorting to obscure formats, unless obfuscation is the point.
        
         | taftster wrote:
         | $> cat file.pdf | strings
         | 
         | Done. /s
        
           | boramalper wrote:
           | Stop cat abuse! /s                   $> strings file.pdf
        
       | shortformblog wrote:
       | Even though Jakob Nielsen is very much still alive, he's rolling
       | in his grave.
        
       | monkeynotes wrote:
       | * PDFs are self-contained and offlineable
       | 
       | HTML can easily be offline-able. Base64 your images or use SVG,
       | put your CSS in the HTML page, remove all 2-way data interaction,
       | basically reduce HTML to the same performance as PDF and allow it
       | to be downloaded.
       | 
       | * PDFs are files
       | 
       | HTML is files
       | 
       | * PDFs are decentralised
       | 
       | This should be "PDFs can be decentralised". PDFs aren't
       | inherently any more decentralised than any other kind of file,
       | including HTML.
       | 
       | The store is the thing that becomes decentralised, not the
       | content.
       | 
       | * PDFs are page-oriented
       | 
       | HTML can be page-oriented. Simply build your website with
       | pagination. PDFs can also be abused to have hugely long pages.
       | Bad UX can be encapsulated in any medium.
       | 
       | * PDFs used to be large (bla bla bla Javascript weighs a lot)
       | 
       | Nope, PDFs are still objectively larger than the equivalent HTML.
       | PDFs don't have any dynamic interaction, rip all that out and
       | produce the HTML of yesteryear and your HTML will be tiny in
       | comparison to the PDF.
       | 
       | Edit: I'm sorry, the more I think about this the dumber I feel.
       | The web is useful because it's 2-way. I am excited by the web
       | because I can interact with other people. I come to hacker news
       | to engage with thinkers, not to just read a published article
       | from one single author. I want to read ad-hoc opinions and user
       | submitted content. PDF web, really?
        
         | Tomte wrote:
         | > Base64 your images [...], put your CSS in the HTML page
         | 
         | Is there a tool that does those two things (or at least the
         | first one) and that can be used by non-programmers (command
         | line use is fine, a Python library would not be)?
        
           | gildas wrote:
           | You can use SingleFile for this, see
           | https://github.com/gildas-lormeau/SingleFile/
        
         | Frost1x wrote:
         | >PDFs don't have any dynamic interaction...
         | 
         | Just a caveat to that statement, you can literally do
         | interactive and dynamic 3D graphics rendering in PDFs:
         | https://helpx.adobe.com/acrobat/using/enable-3d-content-pdf....
         | 
         | You can also embed JS in PDFs:
         | https://helpx.adobe.com/acrobat/using/applying-actions-scrip...
        
           | dathinab wrote:
           | Yes, and many of this things are "in general" not well
           | supported by anything but adobe PDF.
           | 
           | Even most simple interactive things can easily not work
           | correctly even in more widely spread PDF readers.
           | 
           | IMHO PDF is in many ways worse then HTML, it's just that this
           | ways are less commonly used, but if you start a PDF instead
           | of HTML trend it's just a matter of time until this "not so
           | compatible" aspects of PDF become widely used by some people.
        
           | monkeynotes wrote:
           | JS in a PDF? You can do that in HTML, why not use the tools
           | you already have that work together by design?
           | 
           | This guy is arguing that removing JS is what makes the web
           | better. Having published, static, paper-like content is the
           | way forward.
        
             | Frost1x wrote:
             | Just caveating a technical statement I knew wasn't quite
             | true, not making any sort of assessment either way.
             | 
             | As someone who has had to extract data from large sets of
             | PDFs and modern web presentation formats, I'm not a fan of
             | either, really. Even verifying that a visibly presented
             | string exists in a PDF document programmatically can be a
             | non-trivial task, as with a given website as well. That to
             | me says a lot.
        
               | chalst wrote:
               | monkeynotes seems to take the line that technical defects
               | in claims others make fatally undermines their case, but
               | technical defects in his/her arguments are irrelevancies.
               | 
               | For what it's worth, the same objection occured to me.
               | The use of scripting I've seen in PDFs has been use-
               | supporting and consistent with their book-like feel.
        
         | anigbrowl wrote:
         | _HTML can easily be offline-able._
         | 
         | Sure - if the publisher cares. From the user's standpoint, the
         | safe assumption is that they don't. Of course PDF is No Good
         | for many contexts, but for any sort of long-form document that
         | is primarily meant to be read, it's so often better.
         | 
         | Also, if something is available in pdf, I can be moderately
         | sure that someone else took the time to make sure it would be
         | formatted correctly and print out OK.* If it only exists in
         | HTML it's more of a roulette wheel experience.
         | 
         | * Unless some graphic designer thought 'gee this report would
         | look so cool if the cover pages were black or some other highly
         | saturated block of solid color.'
        
         | kemitche wrote:
         | PDFs are also horrible to view on mobile, as the text doesn't
         | reflow.
        
         | majkinetor wrote:
         | PDF
         | 
         | - does not reflow, major suck
         | 
         | - is binary format, another major suck
         | 
         | So no thx, PDF is outdated tech, while HTML and friends are
         | just abused.
        
           | anigbrowl wrote:
           | What I like best about pdf files is that I can just give them
           | to someone and be almost certain that any questions will be
           | about the content rather than the format of the file.
        
         | LeifCarrotson wrote:
         | When you find a page - inherently a document-oriented term -
         | like an article, blog post, how-to, or project writeup that's
         | interesting or useful, and you want to make sure it's available
         | to you later, what do you do?
         | 
         | Do you save the HTML, CSS, and Javascript, and hope that it
         | works offline? I used to use the "Save page as..." tool back in
         | the early 2000s, but it's become less and less useful, with too
         | many dysfunctional disappointments.
         | 
         | No, I cut out some junk I don't need with the Printliminator
         | [1] bookmarklet, then I do a *print-to-PDF.* This gives me a
         | file. I can save the file, back it up to my NAS, search for it
         | later, keep it with other files from a project where it was
         | useful, and otherwise hang onto it. This is so common, in fact,
         | that it's gone from being an obscure thing you could do with a
         | Postscript-to-PDF converter or (before the adware/Ask toolbar
         | scandal) the installing the CutePDF virtual printer. Modern
         | OSes bundle a PDF printer, and print dialogs understand that
         | you want to "Save as PDF". Google Docs and Office 365 editors
         | allow downloading a document as a PDF.
         | 
         | I totally agree that a dynamic, interactive page or a comment
         | section is not compatible with this model of usage. There's a
         | lot of consumption of endless feeds, and a lot of one-time
         | video views that also don't make sense to save as offline
         | files. However, the web for creators, where people write
         | articles that are worth hanging onto, has a definite place for
         | PDFs.
         | 
         | [1]: http://css-tricks.github.io/The-Printliminator/
        
           | apotheon wrote:
           | I actually dislike HTML per se, but the only two benefits I
           | see for PDFs in the general case are:
           | 
           | - In my experience, it's a little harder and rarer to make
           | PDFs utterly incompatible with different means of viewing
           | them, and it generally requires more overt (if perhaps
           | slightly unintentional, at times) sadism to make that happen.
           | 
           | - PDFs can do some things HTML can't (easily, at least) with
           | document design -- though those things are generally things
           | that would be disallowed in our new "deurbanized" PDF-based
           | web replacement.
           | 
           | Everything else that comes to mind goes the other way,
           | including the fact that the viewing-mechanism incompatibility
           | thing can be even worse with PDFs, even if it's more rare for
           | that to happen at present, and if PDFs became the new
           | standard for the web I'm pretty sure that relative rarity
           | would evaporate anyway. Let's also not forget that HTML can
           | also do some things PDFs can't (as easily, at least) do.
        
           | jhgb wrote:
           | > Do you save the HTML, CSS, and Javascript, and hope that it
           | works offline? I used to use the "Save page as..." tool back
           | in the early 2000s, but it's become less and less useful,
           | with too many dysfunctional disappointments.
           | 
           | I'm too lazy, so I just tend to use SingleFile these days...
        
           | derefr wrote:
           | > When you find a page [...] and you want to make sure it's
           | available to you later, what do you do?
           | 
           | Instead of doing a bad and lossy job of archiving the page
           | myself, I notify+ our friendly neighbourhood archivists at
           | the Internet Archive of the page; and _they_ then do the
           | best, most lossless job of preserving the page that they 're
           | able, given their cumulative experience.
           | 
           | + http://blog.archive.org/2017/01/25/see-something-save-
           | someth...
           | 
           | As a side-benefit, they also then take care of keeping the
           | archive they've made around and available online in
           | perpetuity, with no additional marginal effort on my part.
           | The same can't be said for something in my own "private
           | collection."
        
             | htek wrote:
             | That's subobtimal as well. The site could come out with a
             | new robots.txt file which is just <code>User-agent: *
             | Disallow: /</code> and everything already indexed by the
             | Internet Archive is now inaccessible to you.
        
             | tenebrisalietum wrote:
             | > in perpetuity
             | 
             | Hopefully it really is around a very long time, but the
             | world is unpredictable and things change. It's great to
             | enhance the Internet Archive, but you can bet I'm keeping
             | my local copy too. Just in case.
        
               | [deleted]
        
             | turtlebits wrote:
             | Do you never get online receipts that you need to keep a
             | copy of?
        
               | derefr wrote:
               | I don't think I've ever had such a thing that only
               | appeared as a web page, without being emailed to me. To
               | me, the email is the primary-source document in that
               | arrangement.
        
             | Santosh83 wrote:
             | There is value in having a personally curated, offline
             | collection of documents. You can search, annotate or
             | otherwise manipulate it to your heart's content, all
             | without having to be connected.
             | 
             | Of course the Internet Archive serves other purposes for
             | which it is (currently) irreplaceable.
        
               | admax88q wrote:
               | There's also opportunity cost in spending time
               | maintaining, indexing, annotating your own archive of
               | documents.
        
               | cxr wrote:
               | Zotero is much better for this than the too-fiddly print-
               | to-PDF workflow described in the earlier comment.
        
             | daggersandscars wrote:
             | This may not be well-known, but archive.org can and does
             | remove pages / sites from the archive. Authors can request
             | this, site owners (separate from the authors) can request
             | this. There may be others who can request this.
             | 
             | Just an FYI. If there are critical sites you want copies
             | of, I'd recommend making your own copy. I've lost access to
             | important pages / sites twice before taking this to heart.
             | 
             | Edited for clarity
        
             | [deleted]
        
           | blooalien wrote:
           | Also useful: https://pypi.org/project/html2text/
        
           | gregsadetsky wrote:
           | There was an interesting discussion about this a year ago:
           | 
           | https://news.ycombinator.com/item?id=23228098
           | 
           | ----
           | 
           | This is still not as powerful as my one, simple trick to
           | handle all bookmarks, ever: Print to PDF. I've been doing it
           | since last century, and I have 10's of thousands of PDF's of
           | every single web page I've ever found interesting, sitting
           | right there in a directory on my computer
           | 
           | ----
           | 
           | Including the suggestion that was brought up to use ripgrep
           | to search in the pdf text content.
        
             | anigbrowl wrote:
             | Sometimes if I'm researching a topic I'll dig up a big
             | number of newspaper articles and want to print them and
             | read them away from the screen while scribbling notes etc,
             | but on a lot of websites banner ads or footers with
             | copyright statements can really mess it up.
        
         | supperburg wrote:
         | This reminds me of the guy who said drop box was stupid because
         | he could set up an ftp server. It's the exact same argument.
         | 
         | People understand PDFs, they are extremely common in the
         | academic and business world as "digital paper" standalone
         | documents. Hypothetically, anything in memory can be made into
         | a file but in this scenario what matters is the practical goal
         | of people actually using these files.
         | 
         | I think it makes sense for the web to be made up of discreet
         | primitives not only so that the web can be browsed in an
         | intuitive and frictionless way but also because it lends itself
         | to being backed up and easily re-hosted.
        
         | chowderman wrote:
         | > HTML can easily be offline-able. Base64 your images or use
         | SVG, put your CSS in the HTML page, remove all 2-way data
         | interaction, basically reduce HTML to the same performance as
         | PDF and allow it to be downloaded.
         | 
         | I built a tool for this exact purpose[0] since the HTML
         | specification and modern browsers have a lot of nice features
         | for creating and reading documents compared to PDF (reflow and
         | responsive page scaling, accessibility, easily sharable, a lot
         | of styling options that are easy to use, ability for the user
         | to easily modify the document or change the style, integration
         | with existing web technologies, etc.). In general I would
         | rather read an HTML document than the PDF document since I like
         | to modify the styling in various ways (dark theme extensions in
         | the browser for example) which may be hard to do with a PDF,
         | but its more of a personal preference. Some people will prefer
         | that the document adjusts to the screen size of the device
         | (many HTML pages), and others will prefer the exact same or
         | similar rendering regardless of the screen size (PDF).
         | 
         | Either way, kind of a fun idea making a website using just
         | PDFs. Not the most practical choice, but fun none-the-less.
         | 
         | [0] https://github.com/chowderman/hyperfiler
        
         | pajko wrote:
         | This. Also who hates the huge double margins? The slow
         | rendering? The unnatural break-up of text? Meaningless headers
         | and footers? And the whole page-based layout? PDF is not meant
         | for the web. Period.
        
         | stjohnswarts wrote:
         | so because someone chooses to publish their website in an open
         | format that they prefer "it's dumb" because they don't agree
         | with you.
        
         | baybal2 wrote:
         | HTML used to be a very nice format at the age of xhtml 1.1,
         | very formally specified, and a tie with DOM was assured by vert
         | strictly standardised DOM v3. And ACID3 was giving you a pixel
         | for pixel repeatability during rendering.
         | 
         | HTML+JS today... now it's effectively a standard in name only,
         | and Chrome is the new IE6. The standard is now "what has worked
         | in the last stable release"
         | 
         | Now go to http://acid3.acidtests.org/ and see how the latest
         | stable Chrome release can't render a decade old CSS testcase.
        
         | playpause wrote:
         | These all seem like technical quibbles that miss the point.
        
           | jedimastert wrote:
           | This statement could be for both the comment you're replying
           | to and the original article.
        
           | quietbritishjim wrote:
           | > These all seem like technical quibbles that miss the point.
           | 
           | If these all "miss the point", what _is_ the point?
           | 
           | It seems to me that the article's point is that PDF as a
           | format has attributes that satisfy the author's goal, whereas
           | HTML does not. The parent comment says that HTML does have
           | those attributes after all (if you choose to use HTML that
           | way). That is very directly addressing the article's point,
           | as I understand it.
        
             | JohnFen wrote:
             | Perhaps I misunderstood, but I believe the author's point
             | was to highlight what a steaming mess the modern web is.
             | The PDF aspect strikes me as illustrating a point, not a
             | seriously proposed solution.
        
           | wlesieutre wrote:
           | Unless I'm on a paper-sized tablet I would definitely rather
           | have an offline HTML file than a PDF. Nobody likes to pan
           | back and forth on lines of text to read something.
        
             | pseingatl wrote:
             | PDF is size-agnostic. There's nothing to stop you from
             | creating documents the size of a phone screen.
        
               | wlesieutre wrote:
               | I'm commenting here as a user reading a PDF. The fact
               | that someone else could have laid it out differently
               | doesn't change the fixed layout of the PDF that I'm
               | trying to read.
               | 
               | There's a reason responsive design has been a big deal
               | for the last 10+ years and I don't think the benefits of
               | PDF are worth throwing it out.
        
               | JohnFen wrote:
               | As someone who really detests responsive design, the lack
               | of it in a PDF strikes me as a feature, not a bug.
        
             | Robotbeat wrote:
             | I had the exact opposite reaction. I'm reading this on an
             | iPhone SE2020, and I MUCH appreciate reading this in pdf
             | form. I didn't have to pan back and forth or even put the
             | phone in landscape orientation. This is one of the smallest
             | smartphones you can still buy, and the experience of PDF is
             | WAY better than the user-hostile auto-flow text forced down
             | mobile users' throats.
             | 
             | I was skeptical at first, but I think the author made the
             | point fantastically well.
        
               | wlesieutre wrote:
               | To get equally small text on my desktop I have to turn
               | the font size all the way down to 7. God forbid you have
               | readers with less than stellar eyesight.
               | 
               | I get what they're going for but the PDF is not exactly
               | an accessible reading experience.
        
               | nemetroid wrote:
               | I'm using a 2016 iPhone SE, and it's largely unreadable
               | without being very up close.
        
               | cunthorpe wrote:
               | What.
               | 
               | Your browser has a zoom functionality that lets you make
               | the text smaller, essentially replicating the PDF site
               | above. Only the opposite of what you say is correct: I
               | can't read that PDF's text without turning my phone into
               | landscape and picking up my glasses.
        
               | apotheon wrote:
               | EPUB would beat the shit out of PDF for that.
               | 
               | (EPUB is basically a subset of HTML with client-oriented
               | context.)
        
           | monkeynotes wrote:
           | The guy outlines his whole case based on those exact points
           | which are, as you have observed, technical quibbles and not a
           | basis for abandoning HTML.
           | 
           | Under the hood it seems apparent to me that the real premise
           | is an emotional one, not a technical one.
           | 
           | The internet is plastic not because of HTML, but because of
           | money and people. When you have teens driving content it's
           | going to feel plastic. When Walmart uses the internet to sell
           | you crap it's gonna be plastic. Gossip / social platforms are
           | trash, no matter the medium.
           | 
           | It could be argued that TV is an incredible learning platform
           | ruined by HD. Back in the standard definition days we had
           | proper news, documentaries that were substantial, and no
           | reality TV. We need to go back to black and white standard
           | definition.
           | 
           | Sorry, but the PDF web is not a solution to societal rot.
        
             | tablespoon wrote:
             | > The guy outlines his whole case based on those exact
             | points which are, as you have observed, technical quibbles
             | and not a basis for abandoning HTML.
             | 
             | He's actually more of a _social_ observation: it doesn 't
             | matter what the technology _can_ do, what matters how how
             | the developers of that technology _actually_ use it.
             | 
             | People who use PDF almost _never_ use 3D graphics and heavy
             | dynamic JS, so PDFs almost always have many of the
             | qualities he 's seeking.
             | 
             | Web developers almost _never_ inline anything, and do all
             | kinds of things that are arguably deal-breakers except for
             | a few lowest-common-denominator use cases.
             | 
             | > Under the hood it seems apparent to me that the real
             | premise is an emotional one, not a technical one.
             | 
             | The premise is that the web has failed in important and
             | clear ways, it's impossible to fix so we should give up, so
             | many use cases should abandon it for something else, and
             | PDFs are unexpectedly well suited for that.
             | 
             | On a related note, part of me wishes Java Applets never
             | died. Getting rid of them seems to have caused the Web to
             | turn into them, and maybe if they'd remained some kind of
             | separation could have been maintained.
        
               | apotheon wrote:
               | Turning PDFs into the replacement for HTML would change
               | the incentives around PDF authoring, and PDFs would then
               | acquire the same problems identified with HTML.
               | 
               | The solution to the identified problems is not to switch
               | to PDFs. Stop reshuffling the chairs on the deck of your
               | sinking ship, and start figuring out how to design,
               | implement, and incentivize the use of, some means of
               | conveyance other than iceberg-vulnerable ships.
               | 
               | > On a related note, part of me wishes Java Applets never
               | died. Getting rid of them seems to have caused the Web to
               | turn into them, and maybe if they'd remained some kind of
               | separation could have been maintained.
               | 
               | Java Applets were killed by Flash.
        
               | chalst wrote:
               | > PDFs are unexpectedly well suited for that.
               | 
               | Not so surprising, really: the PDF standard evolved in
               | parallel with Adobe's Flash between 2005 and 2010, which
               | was then the key technology in Adobe's effort to keep a
               | strategic toehold on the web. If Flash had not been a
               | security clusterfuck, it might still be around. The PDF
               | standard was always meant to be a complementary standard,
               | and Adobe's attempted successor technologies have
               | followed an even closer technological path.
               | 
               | The PDF standard has benefited from the fact that, unlike
               | the W3C and WHATWG, surveillance capitalists have not
               | been in the driving seat of its standardisation effort.
               | Adobe's interests are not identical to those of the
               | public, but they are not as essentially adversarial to
               | them as the web standards bodies have been.
        
             | adolph wrote:
             | Is the medium the message? Does style have substance? Is
             | form also a function?
        
               | leetcrew wrote:
               | I'm not exactly sure what point you're trying to make
               | here, but I don't think two different formats for
               | encoding formatted text with images constitute different
               | "mediums".
        
               | megameter wrote:
               | Of course they are, and we run into it constantly in
               | computing. You can encode text with images as a bitmap,
               | as vector graphics, as symbolic content that references
               | bitmaps or vectors, as an algorithm that procedurally
               | generates any of the above...
               | 
               | While you can produce identical outputs from the
               | different methods, it's not hair-splitting to say that
               | the authoring process and hence the nature of the medium
               | to shape expression is affected by choosing one. When you
               | opt towards maximizing generality your production cycle
               | can grow without bound because everything is possible by
               | layering different media, even if all of it is
               | unnecessary. That's how you end up with creative projects
               | that take multiple years to decades to accomplish.
        
             | runawaybottle wrote:
             | Well, you seem to get the gist of the hot take the author
             | put out. This article is not about PDFs. There is something
             | wrong with the world and we can sense it.
             | 
             | This is close to it: _When you have teens driving content
             | it 's going to feel plastic._
             | 
             | Youth is the ultimate quality destroyer. They just fucking
             | suck. I'm quite sick of their drivel honestly, and yet, we
             | let them dictate the world (watch my childish cartoons,
             | even in old age).
             | 
             | And the little shits complicate code bases. All you little
             | rascals under 30, scram, I'm on to you.
             | 
             | And all you little adults acting like children, with your
             | stupid motivational posts on LinkedIn, and your garbage
             | bragging on there, I see you too.
             | 
             | Stop.
        
         | novok wrote:
         | Sounds a lot like epub.
        
         | rexreed wrote:
         | Also - how are PDFs exactly "discoverable"? I have petabytes of
         | PDFs and making them easily "discoverable" for any mass use,
         | such as analytics, search, or data analysis is a massive pain.
         | I'd rather have them in a non-PDF format.
        
           | relaxing wrote:
           | The author calling for new content to be authored as PDF,
           | which can easily be made discoverable.
           | 
           | I'm guessing your data set is made of scans with poor or no
           | OCR.
        
             | rexreed wrote:
             | Not a single researcher or data analyst I know of would
             | prefer "discoverable" content to be in PDF format,
             | regardless of just how awesome the OCR is (which it often
             | isn't, especially for tabular data). Even for all-text,
             | non-tabular documents, OCR does not provide the metadata
             | needed to make sense of the documents. Why PDF is claimed
             | to have superior "discoverability" in the OP essay is a
             | mystery to me. For the sake of "discoverability", PDF is
             | definitely not the way to go.
        
               | relaxing wrote:
               | The essay claimed
               | 
               | > PDFs are discoverable. Search engines index them as
               | easily as any other format.
               | 
               | What you're taking about has nothing to do with that.
        
         | gunapologist99 wrote:
         | agreed.
         | 
         | and, ancient HTML can still be easily read by modern browsers,
         | so that's not exactly a special attribute of PDF either.
        
         | camgunz wrote:
         | You got nerd sniped by the HTML vs. PDF format thing and missed
         | the entire point of TA:
         | 
         | > Isn't it a good thing that we enjoy rapid progress? To the
         | extent that we get to enjoy things like YouTube and sandspiel,
         | yes! But to the extent that we want the internet to be a place
         | where we can work and live and think and communicate free of
         | malware, surveillance, dark patterns and the insidious
         | influence of advertising, the answer is, empirically, sadly,
         | no. The web has become ad-corrupted hand-in-hand with growth in
         | technological capability, and the symbiotic relationship
         | between web and browser means they feed on each others' churn.
         | Ads demand new sources of novelty to put themselves on, so the
         | web expands continually, the specs grow in complexity, the
         | browsers grow in sophistication, the barrier to entry grows
         | ever higher, the vast cost of it all demands more ad revenue to
         | fund it... and thus the perpetual motion machine is complete.
        
           | 6510 wrote:
           | The classic mistaking the example for the topic.
        
           | prophesi wrote:
           | No, the entire point of the article is to convince people to
           | use PDF/A. Which I find comical since you have to go out of
           | your way to check if a PDF is PDF/A compliant. If the web was
           | run by PDF's, there's no reason why any big corporations
           | would abide by those rules, and it'd be just as messy as HTML
           | is today.
        
             | camgunz wrote:
             | You've also been nerd sniped. TA goes on and on about
             | surveillance capitalism and the attention economy. Weird,
             | for an article that's supposedly convincing engineers of
             | the merits of one file format over another.
        
               | monkeynotes wrote:
               | I tackled the premise. I think addressing the premise is
               | the logical place to dismantle an argument.
        
               | camgunz wrote:
               | But, again, the premise is not that "as a file format,
               | PDF is better than HTML". The premise is: because HTML is
               | two-way, it enables surveillance capitalism and allows
               | bad actors to monopolize the attention economy. The
               | author wrote it thus:
               | 
               | > Sure, you can write good HTML. I won't argue with that.
               | And if you're writing good HTML, good for you. But HTML
               | is a dual-use technology, the bad guys are dual-using it
               | an awful lot, and I feel that the stone age still has a
               | part to play in the progression of the information age.
               | 
               | The part where you engage with this is where you write:
               | 
               | > I'm sorry, the more I think about this the dumber I
               | feel. The web is useful because it's 2-way. I am excited
               | by the web because I can interact with other people. I
               | come to hacker news to engage with thinkers, not to just
               | read a published article from one single author. I want
               | to read ad-hoc opinions and user submitted content. PDF
               | web, really?
               | 
               | Which is interesting! Do you have thoughts on creating
               | peer-to-peer systems that don't enable surveillance
               | capitalism?
        
               | apotheon wrote:
               | > > Sure, you can write good HTML.
               | 
               | A key here is that it's easier to write good HTML docs
               | than good PDF docs, and much harder to deal with the
               | harmful aspects of PDF docs given present technology.
               | 
               | > Which is interesting! Do you have thoughts on creating
               | peer-to-peer systems that don't enable surveillance
               | capitalism?
               | 
               | I don't know about the other person's ideas, but
               | decentralization plus better anonymization and
               | pseudonimization, with always-on strongest-reasonably-
               | posible encryption, seems like the direction to go.
        
               | camgunz wrote:
               | > A key here is that it's easier to write good HTML docs
               | than good PDF docs, and much harder to deal with the
               | harmful aspects of PDF docs given present technology.
               | 
               | Oh, yeah I'm not on the PDF train. That's wild. I'm more
               | of a Markdown or Gemtext advocate, or even LaTeX.
               | 
               | > I don't know about the other person's ideas, but
               | decentralization plus better anonymization and
               | pseudonimization, with always-on strongest-reasonably-
               | posible encryption, seems like the direction to go.
               | 
               | Yeah, projects like IPFS (which you reference above) are
               | working towards this, but JavaScript still works over
               | IPFS. Plus, fingerprinting techniques are pretty bonkers.
               | Most of it comes down to JS and various state you keep on
               | your local machine (cookies, flash cookies, etc.), but I
               | think you need that. How do you maintain a session with a
               | peer without some kind of token/cookie?
        
               | prophesi wrote:
               | Did you read beyond the "How did it come to this?"
               | section? TA goes on and on about web standards and the
               | need for PDF/A.
               | 
               | Edit: If the article _was_ all about surveillance
               | capitalism, then it wouldn't be worth upvoting as
               | actionable solutions are much more valuable than
               | preaching to the choir.
        
               | camgunz wrote:
               | If you don't think it's clear that the author's advocacy
               | of PDF is a means to an end, subservient to their desire
               | to dismantle surveillance capitalism and the duopoly that
               | Google/Apple have on the web, I don't know where to go
               | from here.
        
               | prophesi wrote:
               | I think you're the one who got nerd-sniped here. 1.5 of
               | the 13 pages in this PDF are about surveillance
               | capitalism. The rest's about web standards.
        
               | Aeolun wrote:
               | What in the nine hells is nerd sniping?
        
               | anigbrowl wrote:
               | why don't we have both?
        
           | cxr wrote:
           | The author does identify a problem, and so you want to focus
           | on that. That's fine. There is the issue of triviality,
           | however.
           | 
           | The problem described is widely felt, and also widely
           | discussed. We already _know_ this stuff to be a problem. For
           | the piece to be worthwhile, then, it should do something that
           | is not present in the other instances where the topic has
           | been raised. It should articulate (or at the very least
           | exhibit, without necessarily articulating) a solution for us.
           | It doesn 't. A bad remedy to a genuine problem does not yield
           | a solved problem.
        
             | slashdot2008 wrote:
             | The author brings a solution, it is to publish documents in
             | PDF instead of HTML.
        
               | apotheon wrote:
               | "A bad remedy to a genuine problem does not yield a
               | solved problem."
        
             | camgunz wrote:
             | The article is called "Deurbanising the Web", and its
             | thesis is:
             | 
             | - Publish in static file formats.
             | 
             | - Date and hash your work.
             | 
             | - Stop spying on your users.
             | 
             | HN is a discussion forum, not project planning software.
             | Not everything has to "yield a solved problem". Are you
             | really setting the bar at "design a technology stack for
             | replacing HTML/CSS/JS"? That's way, way too high.
        
               | apotheon wrote:
               | Those points can be trivially met with static HTML and
               | something like IPFS, and you can _still_ download HTML
               | for local storage and viewing. You can even print to PDF
               | if you really want to do so. Meanwhile, PDFs also allow
               | dynamic files, don 't require dating and hashing, and can
               | be used to spy on users or deliver malware.
               | 
               | EDIT: Oh, yeah, and static file formats doesn't
               | necessarily have to mean static document formatting when
               | viewing -- unless you're using PDFs, which tends to break
               | useful stuff like reflowing for paginated documents (one
               | of the worst things about even simple PDFs).
        
               | bccdee wrote:
               | You say that its thesis is (in part) to generally publish
               | in static file formats, but that's not quite accurate.
               | The piece specifically touts PDF/A as the best format and
               | makes several arguments against the use of html/css. I
               | agree that they're making a broader point than just "use
               | pdf," but "use pdf" is definitely a large part of it.
        
         | grishka wrote:
         | PDFs aren't really meant to be read off a screen, they're much
         | better suited for stuff that's meant to be printed out.
         | 
         | And you can have a single self-contained file with a webpage,
         | it's called a "web archive", with .mhtml extension.
        
         | 1vuio0pswjnm7 wrote:
         | "I come to hacker news to engage with thinkers, not just read a
         | published article from a single author."
         | 
         | And how many websites today are anything like HN, in terms of
         | relative simplicity, e.g., no images^1, 3rd party requests or
         | ads, only a tiny bit of (gratuitous)^2 JS.
         | 
         | 1. I do not particpate in the voting scheme but I could vote
         | from the command line if I wanted to. I use a text-only browser
         | so the grey, fading text gimmick is irrelevant. I see all
         | comments and treat them according to the thinking not the
         | voting.
         | 
         | 2. If we exclude the .ico and a .gif
         | 
         | There seems to be a double-standard, for lack of a better term,
         | where many HN commenters and voters appear to work for
         | companies that make websites with tracking and ads and various
         | gimmicks targeted at "non-thinkers" which are nothing at all
         | like HN. Whatever these commenters and voters see and
         | appreciate in HN they are not working to bring it to the rest
         | of the web. I seriously doubt they comment and vote on HN out
         | of fear of so-called "power users" or a belief that the HN type
         | of simplicity could become more popular and threaten their jobs
         | that depend on surveillance, online ads and a non-thinking
         | audience of "powerless" users. Rather, a more rational
         | explanation might be that they see some value in a website that
         | shows no ads and generally uses no gimmicks; that's something
         | to think about.
         | 
         | "PDF web" may not make sense to many folks who have invested
         | heavily in JS and Big Tech web browsers, but Postscript is
         | arguably more elegant than Javascript. "Thinkers" usually like
         | FORTH.
         | 
         | https://en.m.wikipedia.org/wiki/Display_PostScript
         | 
         | The tracking section mentions the Abe Vigoda status page.
         | 
         | http://www.abevigoda.com/
        
         | noduerme wrote:
         | Honestly, if you're going to put out a manifesto as a PDF, at
         | least take some time "layouting" your design. The one advantage
         | of that format is that you control the aspect ratio. Every font
         | is permissible, everything is absolutely positioned. Using a
         | generator to create it is cringey. Show the art that's
         | possible. Really sell the format.
         | 
         | FWIW I deliver PDFs daily as an art director; not ideal, but
         | they work in most cases. There's certainly nothing rebellious
         | or non-commercial about them.
        
           | EugeneOZ wrote:
           | ...and difficult to read on the small screens of mobile
           | devices.
        
             | noduerme wrote:
             | Yeah. That's why they're only used for print.
        
         | goodpoint wrote:
         | You seem to miss the point of the post:
         | 
         | ----
         | 
         | Call to action
         | 
         | Publish in static file formats
         | 
         | Date and hash your work
         | 
         | Stop spying on your users
         | 
         | ----
         | 
         | All this cannot be GUARANTEED by HTML/pdf/epub and requires
         | active cooperation from the author. This is bad.
        
         | marcosdumay wrote:
         | > PDFs don't have any dynamic interaction
         | 
         | Oh, you are set for a world of surprises. Nearly every single
         | one bad, but running our current web over PDFs is well within
         | the specs.
        
         | ChrisMarshallNY wrote:
         | _> Simply build your website with pagination._
         | 
         | My experience is that browsers are _terrible_ with CSS
         | pagination support in their display and printing directly.
         | 
         | The only place it seems to actually work is...saving as a
         | PDF...
        
         | hyperpape wrote:
         | Saying HTML can be offlineable is like saying C can be provably
         | terminating. There's a subset of programs where that's true,
         | but it's not inherent to the form. A PDF is inherently self-
         | contained, standard web technologies are not. When you open the
         | page and it's a PDF, it gives you certain guarantees, when you
         | open it and it's HTML, you have to have to do further
         | investigation.
        
           | JadeNB wrote:
           | > When you open the page and it's a PDF, it gives you certain
           | guarantees ....
           | 
           | I think that this is a lot less true than we're used to
           | thinking. The PDF spec contains a lot more interactive
           | capabilities than I think most people realise. (It supports
           | JavaScript!) We're not used to seeing those capabilities
           | abused, because there's no point; it is so much easier to
           | abuse HTML. But, if people _want_ to abuse PDF--and, if we
           | somehow convinced the world to move to it, then they would--
           | then they easily can.
           | 
           | (I'm not conversant enough in the spec to know, but I do know
           | that Postscript is Turing complete, and I don't know that PDF
           | isn't. At least HTML on its own certainly isn't--no
           | recursion!--although all bets go out the window once you
           | start layering other tech on top of it.)
        
           | monkeynotes wrote:
           | I don't buy that the problem with the web is that HTML is not
           | inherently offlineable. HTML may not be inherently
           | offlineable but it can be. PDF isn't inherently a web
           | friendly format, but it can be. There really isn't any good
           | argument for PDFing the web.
        
             | pajko wrote:
             | Print the page to PDF.
        
               | tablespoon wrote:
               | > Print the page to PDF.
               | 
               | Even that usually sucks nowadays, because web developers
               | don't care anymore. Probably 75% of the time before I do
               | that, I have to go into the dev console to delete overlay
               | elements that obscure content and garbage that will waste
               | 10 pages (e.g. grossly oversized images, related article
               | recommendations, etc.).
               | 
               | There was a time when most websites had a print view that
               | gave you a simplified html page that worked well, but I
               | think most of those are gone now. Now it's all some print
               | "media-type" CSS that no one ever put the time in to do
               | properly or keep up to date.
        
           | stjohnswarts wrote:
           | I agree, I don't see why anyone can call publishing in PDF is
           | "dumb". The author of the material gets to choose his medium.
           | If "you" don't like it then move along or convert it to your
           | preferred format. In other words "why not both?"
        
           | lucideer wrote:
           | Firstly, C being provably terminating is a problem dealing
           | with the full body of C programs written in the world. The OP
           | is dealing with their own self-published content. That's a
           | different problem: if your analogy held it would need to be
           | limited to proving that a subset of C programs written by the
           | author terminate.
           | 
           | Secondly, the level of difficulty in making HTML offlineable
           | is many orders of magnitude simpler than your C analogy:
           | there's really no comparison. For the OP we only need to make
           | HTML documents that _they have authored themselves_
           | offlineable and yet people have written general purpose tools
           | to do this automatically for most webpages. This is not a
           | hard problem.
           | 
           | TL;DR your analogy is absurd.
        
             | hyperpape wrote:
             | This is a helpful post because it gets to the heart of the
             | difference. Many people are saying "if you do HTML in a
             | particular way, you get the same benefits." I'm asking
             | "what's inherent to the form?" That's exactly the point
             | about C--you can write it in a way that's provably
             | terminated, but it's not guaranteed. Consider the
             | consumer's perspective.
             | 
             | When I land on a page that's a PDF, I know certain things--
             | I can easily save it and read it later. How do I know that?
             | Not because I have read the PDF spec, or know that much
             | about it, but because of my experience as a consumer of the
             | web.
             | 
             | When I land on an arbitrary web-page, do I know the same
             | thing? No. I don't know what the page is doing, I don't
             | know what my browser will do when I try to save the page.
             | When I save this page, I have the option to save HTML only,
             | or a complete web page. Will the complete page actually
             | work? I go into the source, and there's a link to the
             | javascript (which is saved locally). Does rendering the
             | page rely on that javascript? Does that javascript do xhr
             | or fetch calls? Since it's Hacker News, I suspect the
             | answer is no. However that's not inherent to the medium.
             | 
             | There are better ways to archive the content of even
             | dynamic JS heavy pages, but they are not things that you
             | learn as an average user of the web.
        
               | lucideer wrote:
               | I don't really follow. How does this author converting
               | their entire site to PDF help readers/visitors/users?
               | 
               | The original HTML site[0] was printable as PDF, and save-
               | able as both HTML and "Web page, complete", all of which
               | result in a well-formatted & readable offline experience.
               | (It was also responsive: very readable on mobile, but
               | that's an aside).
               | 
               | The new PDF site is not accessible to some, difficult to
               | read on mobile, and interacts poorly with all of the
               | norms web users are accustomed to (back navigation,
               | anchors, etc.)
               | 
               | [0] https://web.archive.org/web/20130127175816/http://www
               | .lab6.c...
        
               | hyperpape wrote:
               | It's the difference between "this thing has X property"
               | (termination or able to save for offline reading) and
               | "this thing _obviously_ has X property, in a way that you
               | can tell without any expertise, or doing any
               | investigation".
               | 
               | How important this is to users, or whether it is worth it
               | is something I've not commented on, but it is a
               | _difference_.
        
               | apotheon wrote:
               | It's possible to write PDFs that don't "work" (for some
               | useful definition of "work" similar to the case with
               | HTML) offline. Please stop pretending that's not true.
               | 
               | The reason offline utility tends to be true more often
               | for PDFs is that PDFs are not generally regarded as the
               | preferred online-default format of choice, which is in
               | turn a matter of social effects rather than technical
               | capacity. Reverse the socially accepted roles of the two
               | document formats and watch the same complaints get made
               | against PDFs as you're making against HTML. I'd bet money
               | the "normal" state of affairs would remain the same in
               | terms of the perceived benefit/detriment allocation
               | between online/offline formats; only which format was
               | considered which would have changed.
               | 
               | . . . but then all the web would be even heavier
               | documents, and even less customizable for local viewing,
               | thanks in part to that pagination and strict formatting
               | situation.
        
               | anigbrowl wrote:
               | It's possible, but it takes work. I can't remember the
               | last time a pdf did something unreadably weird, usually
               | my only gripe is with something that's a scan of an old
               | document but whoever turned it into PDF didn't do OCR.
        
             | chalst wrote:
             | hyperpage's analogy would work if the property was "avoids
             | undefined behaviour", rather than "avoids nontermination".
             | When we encounter a webpage, we are being expected to
             | execute potentially complex, well-being threatening code
             | whose behaviour is about as easy to predict as obfuscated
             | C.
        
               | apotheon wrote:
               | PDFs are capable of the same issues.
        
               | lucideer wrote:
               | True but again only if we're talking about parsing the
               | web. This is about HTML files the author is producing
               | themselves.
        
           | EugeneOZ wrote:
           | > A PDF is inherently self-contained, standard web
           | technologies are not
           | 
           | What technologies exactly? You can have absolutely everything
           | you need inside the HTML. You can inline css, js, svg and
           | images. What technologies you can't inline?
        
             | aenigma wrote:
             | you are correct that you CAN - but who does. That's no
             | longer considered best practice. The arugment these days is
             | that it's a lot easier to manage css if it's in a separate
             | file, same with js, etc. So none of the serious web
             | developers actually do anything inline anymore. The time it
             | would take to convert a "best practice" website with
             | separate files for html, css, js, etc. is just not worth
             | it. The point he's making is still valid - why not have the
             | option for something static.
        
               | EugeneOZ wrote:
               | But with the same (and even much bigger) success you can
               | declare "I'm switching to self-contained HTML! No more
               | external resources!" instead of "I'm switching to PDF,
               | saying farewell to interactivity and mobile devices".
               | 
               | It's just the declaration of ONE person, switching ONE
               | site.
        
               | apotheon wrote:
               | > why not have the option for something static
               | 
               | You have the same option with either HTML or PDF:
               | 
               | - PDF files can be dynamic or static, depending on how
               | you write them.
               | 
               | - HTML files can be dynamic or static, depending on how
               | you write them.
        
         | tablespoon wrote:
         | >> * PDFs are self-contained and offlineable
         | 
         | > HTML can easily be offline-able. Base64 your images or use
         | SVG, put your CSS in the HTML page, remove all 2-way data
         | interaction, basically reduce HTML to the same performance as
         | PDF and allow it to be downloaded.
         | 
         | You're missing the point. Even a relatively computer-illiterate
         | person can easily save a PDF to my hard drive, and it's
         | _significantly_ more difficult with HTML. At a minimum you 're
         | probably going to get an HTML file with a sidecar directory (or
         | I believe a sometimes browser-specific archive, it's been a
         | long time since I tried since it works so poorly), and even
         | that may not have the content you want to due to dynamic sites.
        
           | apotheon wrote:
           | You can write HTML pages to be self-contained and offline-
           | friendly.
           | 
           | You can write PDFs to include resources that are not part of
           | a single, self-contained file, and to be quite unfriendly
           | with offline use.
        
           | enumjorge wrote:
           | I guess I don't really understand the point being made. Does
           | it matter that much that saving a page create a single file
           | in your hard drive? If you really want a static rendering of
           | a site why not just print it to a PDF. Why does that have to
           | dictate the file format you use for distribution? With PDFs
           | you don't have to worry about conversion but they are also
           | comparatively larger over the wire.
           | 
           | > even that may not have the content you want to due to
           | dynamic sites
           | 
           | But PDFs also don't give you dynamic content. Nothing is
           | stopping people from using HTML to serve static, JS-less
           | content. In fact that's what it was originally designed to
           | do. All this web app stuff was bolted on afterwards, and it's
           | optional.
           | 
           | What do we accomplish by having some people switch over to
           | PDFs? The people who don't care about bloat will continue to
           | not care about it. It's not like thin content will become
           | more discoverable or more common. It doesn't really change
           | incentives. The author says using PDFs makes it so you're not
           | tempted to add cruft to your sites but that's not really a
           | compelling argument.
           | 
           | Getting content creators to produce content without bloat is
           | not really a technical problem. It's a cultural and economic
           | one. I don't see how a file format addresses that.
        
             | spion wrote:
             | The file format restricts the possibilties. You know what
             | to expect when you see a PDF - static, JS-less content.
             | With HTML on the other hand, it depends on what the author
             | decided.
        
               | JadeNB wrote:
               | > You know what to expect when you see a PDF - static,
               | JS-less content.
               | 
               | You know to _expect_ that, but there 's no guarantee
               | that's what you _get_. PDF supports JavaScript too.
        
             | fjtktkgnfnr wrote:
             | > _Does it matter that much that the artifact of saving a
             | page be a single file in your hard drive?_
             | 
             | Yes, it matters a lot. Word/Excel files are actually a zip
             | archive containing many files and sub-directories. Can you
             | imagine people working with exploded Word files, sending
             | over mail and WhatsApp complete directory trees?
        
           | monkeynotes wrote:
           | As I explained, if the author wants to make HTML easily
           | offlineable then inline CSS and Base64 images. Or, you know,
           | make your website printable. If authors actually thought
           | about the print to PDF "problem" it could be solved with
           | traditional CSS and HTML. As someone else said, we used to do
           | this. It used to be part of my every day web design job to
           | make sure the page printed nicely.
           | 
           | The idea that the whole web is going to pander to edge case
           | archivers is asinine. This whole conversation is about
           | supporting the needs of the very, very few and romanticizing
           | about the time when only interesting people used the
           | internet. It's kind of elitist and self serving.
        
           | naravara wrote:
           | > You're missing the point. Even a relatively computer-
           | illiterate person can easily save a PDF to my hard drive, and
           | it's significantly more difficult with HTML. At a minimum
           | you're probably going to get an HTML file with a sidecar
           | directory (or I believe a sometimes browser-specific archive,
           | it's been a long time since I tried since it works so
           | poorly), and even that may not have the content you want to
           | due to dynamic sites.
           | 
           | Ctrl+P -> Save as PDF
           | 
           | You don't need the page to be a PDF to save it as a PDF.
        
           | stzups wrote:
           | >> it's significantly more difficult with HTML
           | 
           | Right Click > Save as
           | 
           | Try it with this page!
        
             | romwell wrote:
             | Yeah, no. Try it with _any other page_ , and see why nobody
             | would be inclined to even _try_ "Save As.." a web page
             | anymore.
        
               | biztos wrote:
               | I actually did this pretty recently, in an attempt to get
               | some magazine articles onto my Kobo e-book reader since
               | Pocket couldn't fetch the paywalled ones (I do pay).
               | 
               | I figured I could just save the page, automate a few
               | edits to get around dynamic stuff, and then use it as,
               | you know, an HTML _document._
               | 
               | Even with a nice friendly mostly-text literary magazine,
               | after about five hours I gave up and just copy-pasted the
               | rendered text.
        
             | tablespoon wrote:
             | > Right Click > Save as
             | 
             | > Try it with this page!
             | 
             | Say hello to your new sidecar directory (or broken
             | CSS/images/God knows what else)!
             | 
             | I tried to save an NY Times article, and it 1) needed JS to
             | display anything, 2) even with the sidecar stuff was
             | broken, 3) it was so plastered with ads and other junk I
             | thought it was incomplete (it wasn't, I just had to scroll
             | waaay down past something that looked like a footer and
             | some voids after that).
             | 
             | If you save a PDF, you get that exact PDF on your hard
             | drive, and when you open it (even in 10 years) it will look
             | exactly the same as it did on the site.
             | 
             | With PDF WYSIWYS: What you see is what you _save_.
        
               | trey-jones wrote:
               | This is of course the point of the article - that the web
               | is a giant steaming pile of shit for the most part,
               | plagued by JS and external resource requirements, all of
               | which contribute to massive total page size.
               | 
               | I'll preface by saying I have some expertise in HTML, but
               | none in PDF (the format).
               | 
               | The point of most commenters who suggest that HTML is
               | still a better alternative than PDF (I agree), are
               | assuming that if this is an important issue to you, that
               | you would craft your page in a simpler style compared to
               | most of what we see on the web, making Print to PDF or
               | Save As... more viable.                 > PDFs and a PDF
               | tool ecosystem  exist today. No need for another ghost
               | town   GitHub   repo   with   a   promising   README
               | and   v0.1   in progress.
               | 
               | This is news to me. I'm not sure that I buy it. PDFs have
               | always been a pain in the ass to work with in my opinion.
               | Maybe there are tools, but in my experience they aren't
               | very good.
               | 
               | In general, we know that HTML is going to be much more
               | compact (and compressible!) than PDF and that's the
               | biggest advantage I see on a web where bandwidth still
               | matters. Another downside shows itself by trying to copy
               | and pasting the above quote: PDF formatting seems to be
               | weird.
        
               | tablespoon wrote:
               | > This is news to me. I'm not sure that I buy it. PDFs
               | have always been a pain in the ass to work with in my
               | opinion. Maybe there are tools, but in my experience they
               | aren't very good.
               | 
               | > In general, we know that HTML is going to be much more
               | compact (and compressible!) than PDF and that's the
               | biggest advantage I see on a web where bandwidth still
               | matters. Another downside shows itself by trying to copy
               | and pasting the above quote: PDF formatting seems to be
               | weird.
               | 
               | PDF is a display format. I once worked on a project
               | parallel to a guy who was parsing PDF to extract text
               | content. IIRC, Text in PDFs is stored in a way that works
               | fine for printing/rendering but not so well for
               | manipulation (e.g. it's a bunch of commands to render
               | line Z at position X,Y with font W). Those commands don't
               | have to be in reading order, nor do they have the
               | semantic meaning you can get from markup like HTML (e.g.
               | superscript can just be nothing more than a different
               | line rendered with a smaller font).
               | 
               | IMHO, PDF is actually less optimal than HTML for what
               | this guy is advocating, except that it's those precisely
               | those limitations that have prevented PDF from becoming
               | the mess than Web HTML has. Though, that's probably in
               | large part because the bloaters have been too distracted
               | by the easier-target that is HTML to bother.
        
               | chalst wrote:
               | > In we know that HTML is going to be much more compact
               | (and compressible!) than PDF and that's the biggest
               | advantage I see on a web where bandwidth still matters.
               | 
               | PDFs can be tiny if they do not embed fonts. Serving
               | fonts is very much a complex technology in HTML world.
               | 
               | Browsing the web is a pain in the ass if you don't use a
               | browser compliant with up-to-date standards, but the
               | whole "HTML can be lightweight" argument pretty much
               | depends on avoiding much of today's standardisation. As
               | an objection to the original argument, it is not
               | comparing like with like.
        
             | JadeNB wrote:
             | > >> it's significantly more difficult with HTML
             | 
             | > Right Click > Save as
             | 
             | > Try it with this page!
             | 
             | HN is not a good site to illustrate the unpleasantnesses of
             | navigating the modern web. As you'd hope for a _hacker_
             | news site, it is very friendly to this sort of thing. Most
             | sites aren 't.
        
           | justusthane wrote:
           | But if you want a page in PDF, you can print it to PDF. Sure,
           | non-computer-savvy users might not know how to do it off-the-
           | bat, but browsers make it pretty easy.
        
             | tablespoon wrote:
             | > But if you want a page in PDF, you can print it to PDF...
             | 
             | Printing a page to PDF usually _sucks_ : See
             | https://news.ycombinator.com/item?id=27883028
        
           | MisterBastahrd wrote:
           | Or I could just make sure that my page prints reasonably well
           | (we used to do this) and use the print-to-pdf functionality
           | available in modern browsers.
        
         | Koshkin wrote:
         | All true. Incidentally, I do not see pagination as necessary or
         | in most cases even desirable; rather, I see it as a vestige of
         | the printing technology, while the need for printing has shrunk
         | dramatically over the past 20 years.
        
       | eaton wrote:
       | The whole post boils down to: "HTML is bad because it has scope
       | creep and people use it for bad things, but PDF is good because I
       | made this particular document in a way I like for a use case I
       | prefer."
       | 
       | You do you, man! Some people run Archie servers, some people
       | create a directory full of PDFs.
        
       | ergot_vacation wrote:
       | The sad thing is, this is what the web was SUPPOSED to be, more
       | or less: a series of static documents, text and images. The only
       | interactivity (setting aside the occasional CGI forms) was that
       | you could click certain images or text and go to other static
       | documents. Documents linked to documents.
       | 
       | Then everyone lost their minds and decided webpages needed to be
       | PROGRAMS and we've been paying the price ever since.
        
       | saint-loup wrote:
       | This experiment is interesting, but not so bold or novel when you
       | consider the culture around making zines (small, DIY, often
       | quirky magazines). The creativity there is amazing and medium-
       | wise it's often "hybrid" (print-oriented but shared online).
       | 
       | For instance there's this tool to help creating zines.
       | https://alienmelon.itch.io/electric-zine-maker
        
       | BaldricksGhost wrote:
       | How about plain old HTML? Might not be as pretty but it sure
       | beats a bloated PDF.
        
         | npteljes wrote:
         | It also wouldn't be upvoted on HN. I agree that a static page
         | generator would have been a much more fitting technology (for
         | example). But sometimes you gotta sacrifice that for
         | visibility.
        
       | SynapsePixels wrote:
       | * PDFs used to be inaccessible, but now you can tag them.
       | 
       | This PDF is not accessibility compliant.
        
       | gtirloni wrote:
       | I can't read this in my phone. There's no automatic layout and
       | the fonts are too small. Zooming in works but it's a nightmare to
       | navigate.
       | 
       | This is an accessibility disaster.
        
       | halayli wrote:
       | pdf a major attack surface too.
        
       | Ostrogodsky wrote:
       | "And for that reason I am creating a 1 MB behemoth that you need
       | to download to read 3000 words or so."
        
       | croes wrote:
       | Now fight
       | 
       | https://www.nngroup.com/articles/pdf-unfit-for-human-consump...
        
       | qwerty456127 wrote:
       | PDF is very far from an ideal format for the today world of
       | different-sized screens. It is a horrible experience on mobile
       | and even worse on eInk pocket books. I would rather advocate
       | making everything available in ePub. Or even better - FB2, it is
       | an easy to grok/implement (designed with manual authoring, simple
       | scripted processing and low-end devices in mind) single-xml
       | structure decoupling the content from the view even more. I often
       | convert ePubs to FB2 (with Pandoc and Calibre) to make PocketBook
       | render them in its native fonts (which always are better) rather
       | than in the font specified in the ePub.
       | 
       | I would also mention that the text within PDFs often is not
       | machine-readable (you copy-paste it and get text without spaces,
       | with additional spaces or complete garbage) but I believe this is
       | easily avoidable if you bake PDFs a proper way.
       | 
       | I could also suggest publishing everything in Markdown (with
       | images embedded in a Base64 section in the bottom) but this
       | doesn't seem practical because browsers, book-reading apps and
       | eInk devices don't support nice rendering of them directly.
       | 
       | > "But how can I implement shiny whizz-bang features that will
       | engage readers and drive conversions?!" You can't. PDF is boring
       | 
       | It's not. It supports JavaScript, embedded video and other kinds
       | of active content. Sadly.
        
       | SethMurphy wrote:
       | Naming or framing things in a difficult or obtuse way can be a
       | good way to limit your audience. However, if it works others will
       | follow and it will no longer be effective.
       | 
       | I had a similar experience with a Meetup I once hosted which I
       | specifically put in a location that was difficult (but admittedly
       | becoming trendy). It worked for a bit but eventually attracted
       | the crowd I was trying to alienate.
        
       | wccrawford wrote:
       | I find it quite amusing that the author is railing against HTML
       | at least in part because it's practically impossible to build a
       | new web browser at this point, and then moves to PDF instead.
       | 
       | In my time working with PDFs, I've found that generating them in
       | ways that can be read with the most popular PDF readers is
       | cryptic and difficult, and even parsing the ones made from the
       | most popular creators is hard.
       | 
       | I would definitely not pick PDF over HTML in regards to how easy
       | it is to implement a good reader or writer.
       | 
       | And there's plenty of authoring tools for HTML already, so the
       | "ecosystem already exists for PDF" doesn't track either.
       | 
       | Even the complaint about churn makes no sense to me, because
       | there's no need to upgrade your tools constantly. If you're using
       | something that produces good HTML today, it'll produce good HTML
       | in a decade, too.
       | 
       | OTOH, if you have a problem that could be automated, you're a lot
       | more likely to be able to create that tool for HTML than PDF, and
       | it's quite likely that someone else already has for HTML, but not
       | PDF.
        
         | TheFreim wrote:
         | > In my time working with PDFs, I've found that generating them
         | in ways that can be read with the most popular PDF readers is
         | cryptic and difficult, and even parsing the ones made from the
         | most popular creators is hard.
         | 
         | Both pdf readers on my phone can't read the pdf, so this is
         | definitely an issue.
        
       | trhoad wrote:
       | I just ran your PDF through an accessibility checker and it
       | failed magnificently. For this reason alone, suggesting people
       | make more use of PDFs instead of well-formatted HTML is a total
       | non-starter for me (and should be for everyone).
        
         | wy35 wrote:
         | My thoughts exactly, I feel like it would be easier to write
         | accessible webpages (given the wealth of accessibility tools).
        
         | john-doe wrote:
         | Even Word documents are more accessible than PDFs.
        
           | zinekeller wrote:
           | Heck, even PDFs produced by Word (or comparable FOSS editors)
           | are so much better (except if you've done it incorrectly by
           | "printing" it) than this particular one.
        
         | Finnucane wrote:
         | Making properly accessible PDFs is possible, but it is a pain
         | in the ass. Certainly more difficult than with plain HTML.
        
         | robin_reala wrote:
         | It's entirely possible to write accessible PDFs. It's just that
         | no-one does.
        
           | trhoad wrote:
           | It is indeed! And you're right, nobody does, including this
           | example.
        
             | jfk13 wrote:
             | And even if they did, many of the readers/viewers people
             | use wouldn't fully support it.
             | 
             | While it's possible to royally mess up accessibility in
             | HTML, too, the chances of getting something usable are at
             | least somewhat better.
        
       | cerved wrote:
       | this is a joke right?
        
         | jacamera wrote:
         | Yes. Though I think the real question is whether or not it was
         | intentional.
        
       | Symbiote wrote:
       | > PDFs used to be unreadable on small screens, but now you can
       | reflowthem.
       | 
       | (Pasted verbatim, retaining the missing space.)
       | 
       | I don't see this feature in Firefox's viewer, or the default
       | Android one. Can anyone recommend a FOSS PDF viewer that has it?
       | (It must be FOSS, otherwise the point about using PDF to avoid
       | tracking is lost.)
        
         | nulbyte wrote:
         | Book Reader can reflow PDFs. It is very simple,, which I like.
         | But it adds any PDF you open to the library when you open the
         | app, which I find only slightly annoying for non-books.
        
       | divbzero wrote:
       | I like the spirit of this but would prefer text or static HTML
       | over PDF as choice of file format.
        
       | [deleted]
        
       | agomez314 wrote:
       | Th author has a point in that many people want an online presence
       | but the way the imagine it is more akin to a pamphlet or poster
       | than a hyperlinked website.
       | 
       | If that is the case, then pdf or a resizable image makes sense.
        
       | richardwhiuk wrote:
       | You can server side render PDFs and make them dynamic if you
       | wanted to.
        
       | GrumpyNl wrote:
       | Instead of pdf, why not the most basic HTML?
        
       | yesenadam wrote:
       | This was a great read. I'm sympathetic! I've had a website
       | (Wordpress) for almost 10 years, but have stopped adding stuff to
       | it lately, because I'm sick of the formatting changing on pages!
       | I look again at a page that used to look great, now the vertical
       | spacing is wrong, or tables have gone out of shape, or the font
       | has changed to something awful. Maybe it's wordpress, maybe it's
       | my bad css/html skills, maybe something else, not sure. I picked
       | up LaTeX skills about 5 years ago and have just been making
       | lovely PDF books of everything I'm into. And they stay just the
       | way I made them. Kind of a shame though, no-one else gets to see
       | them. Yet.
        
       | justanotherguy0 wrote:
       | Not optimized for mobile so I didn't read much and bounced.
        
         | PretzelPirate wrote:
         | I read it on my phone. I then clicked an external link at the
         | end and then hit my browser back button. I had to wait for the
         | PDF to re-load and was unhappy when I found myself back at the
         | top of the document.
         | 
         | I would get a much better experience with html.
        
       | DocTomoe wrote:
       | This sounds like the Creative Director I worked with, ca. 1998,
       | who bemoaned that he couldn't have pixel-perfect layouts over a
       | wide variety of devices/browsers/operating systems.
        
       | apotheon wrote:
       | Why does it seem like almost everyone doesn't realize that PDFs
       | can easily be made to support all the horrors we see in HTML? No,
       | it's fucking well not impossible -- or even notably difficult --
       | to jam some malicious dynamic code into a PDF. The only reason a
       | period of widespread fear about PDF viruses hasn't developed as
       | it has for websites spreading malicious code is the fact that
       | websites got much more widely adopted. PDFs have been used as
       | malicious code vectors before, and replacing HTML with PDFs would
       | only result in PDFs being the new common vector for the same
       | problem, with at least the same scale and intensity.
       | 
       | This only seems like a solution if you don't know what PDFs can
       | do -- and, by the way, sometimes pagination is bad, especially
       | static (non-reflow) pagination.
       | 
       | EDIT:
       | 
       | Let's make this clearer.
       | 
       | You can actually embed an entire JavaScript application in a PDF.
       | Tell me again how PDFs somehow prevent the problem of dynamic
       | pages on the web. All using PDFs instead of HTML pages would do
       | is wrap the horrors of the web in forms that are generally more
       | hostile to various viewing contexts for the less harmful use
       | cases (e.g. static pages suddenly being harder to read in some
       | contexts with PDFs than with HTML pages).
        
       | millerm wrote:
       | Yeah, 10 second load time, tiny text on a mobile device. No
       | thanks. Sucks that people went for over-styling every site making
       | everything painful to publish. I'd be happy with 90's static
       | HTML, and a few images when needed. I seek information, not "an
       | experience".
        
         | Robotbeat wrote:
         | On the contrary, I much prefer a small text on a mobile device
         | to the reflowed text on a mobile device that we're always
         | forced to use. The PDF is also the same view as on a desktop,
         | so if I look at it on another device, my spatial memory of
         | where stuff is remains intact.
        
           | millerm wrote:
           | Might as well just generate a PNG. The text is too small for
           | me on a mobile device. PDFs main goal was print. The fonts
           | are awful for the screen and no ability to reflow the text.
           | 
           | I can deal with things moving around, I don't need spatial
           | memory for that. Just give good titles, headers, and indexes.
           | Again, we can do this with simple HTML, embed images and
           | styles. It's all there.
           | 
           | Unfortunately, as I mentioned, people don't really publish
           | information anymore. It's mainly for "experience" and for
           | "looks". Marketing, and advertising, now drive the
           | information era. The "Information Super Highway" is now just
           | a crumbling road plastered with billboards. Most content is
           | useless, and is there for clicks. Heck, I'd rather someone
           | post their site in digests in e-book formats than PDF.
        
         | fbrchps wrote:
         | Exactly my reaction to opening the site.
         | 
         | I had no idea what the content of the site was (besides the
         | title from HN) and around the 50% download point, I had already
         | lost interest. I'm clearly not the only one who loses interest
         | this quick [0][1][2].
         | 
         | Also, as others have mentioned in root level comments, the
         | design & layout of the content within is also severely lacking,
         | which makes waiting for the load to occur even less worth it.
         | 
         | ---
         | 
         | [0]: https://www.pingdom.com/blog/page-load-time-really-affect-
         | bo... (2018)
         | 
         | [1]: https://blog.mozilla.org/metrics/2010/03/31/firefox-page-
         | loa... (2010)
         | 
         | [2]: https://www.thinkwithgoogle.com/marketing-strategies/app-
         | and... (I know it's Google, but to be fair they have more data
         | on this than most other companies, despite their obvious desire
         | to sell more of their product/services related to it.)
        
         | AlexAffe wrote:
         | Exactly this. It is by the way one of the main reasons I
         | initially stuck with HN. The lean UI, text based simplicity,
         | efficiently conveying information had me instantly. I would
         | sacrifize styling for speed anytime, everywhere.
        
       | uncomputation wrote:
       | I cannot tell if this is satirical or not. Assuming it is not,
       | every single "pro" of PDFs is just plain incorrect except for the
       | one about being "self-contained" to which I point to
       | https://gwern.net as a good example of self-contained HTML. Gwern
       | archives all the pages he references so that they are always
       | available.
       | 
       | In the case this is satire, I applaud it because I did get a few
       | chuckles.
        
         | bittercynic wrote:
         | In the words of the great Ivan Stang: "I'm joking AND I'm
         | serious!"
         | 
         | *I'm not the author, just thought the sentiment from that quote
         | applied here.
        
       | the_other wrote:
       | If you don't want churn, don't churn.
       | 
       | PDF is not a web format and you're wasting effort trying to
       | shoehorn print content and a print format for display on the web.
       | Just use HTML and don't update it, it's probably easier.
        
         | nonameiguess wrote:
         | It's not a browser format (though browsers can render it), but
         | that isn't the same not being a web format. The web is just the
         | ability to retrieve files from other people's servers, that may
         | themselves reference other files on yet other people's servers.
         | As long as a file format supports hyperlinks, then it's
         | suitable for the web. If you don't care about being able to
         | actually click the hyperlink to activate your desktop system's
         | uri schema handler, then even plain text works fine.
        
         | silon42 wrote:
         | EPUB?
        
           | jacobmischka wrote:
           | Which is just basic HTML and CSS itself.
        
             | guywhocodes wrote:
             | Yeah but it's a decent subset. Most of the complaints of
             | the author should be significantly better
        
               | jacobmischka wrote:
               | It would be better if they just used that subset and just
               | published it directly instead of needlessly repackaging
               | it, but if that's what was meant then sure. Maybe we need
               | a better name for simple, semantic HTML and basic CSS.
        
               | Finnucane wrote:
               | The point of it is to be a self-contained package. You
               | still need hardware to read it, but not a server. In
               | theory at least, once you have it, it's yours. (of course
               | the commerical ebook vendors are trying to spoil that.)
        
               | goodpoint wrote:
               | No, it still supports plenty of trackers/spyware and so
               | on.
        
           | mojuba wrote:
           | EPUB is an under-appreciated format that I think can serve as
           | a short to mid-term storage for human knowledge. Can
           | reasonably re-flow itself when necessary, no language run-
           | time required, just a full Unicode support at least at the
           | level of the time the file was published.
           | 
           | That's the Internet of knowledge I'd love to see: things
           | organized in EPUB's, searchable and downloadable.
        
         | massysett wrote:
         | It's pretty amazing that the basic HTML that I learned 20 years
         | ago still works - it even displays fine on devices like tablets
         | and phones that did not even exist 20 years ago. I understand
         | the author's sentiment but PDF is an overreaction. Just write
         | static boring HTML.
        
           | cxr wrote:
           | Indeed, there's a lot of irony packed into the first page:
           | 
           | Featured is a quote from LWN indicting the "software
           | industry" and its "brittle dependencies". What's ironic about
           | this? It's squarely about the parts of the software industry
           | that deal in things that are _not_ meant to be painted in the
           | browser.
           | 
           | If you want a solution to the (perceived) churn, it's funnily
           | enough right in the quote from Mark Pilgrim: "I've migrated
           | to HTML 4". HTML is almost certainly not going to end up
           | drifting in such a way that DJB's qhasm bibliography page[1]
           | is ever going to break. HTML and the Web standards in general
           | are, with extremely rare exceptions, _cumulative_. It 's
           | pretty frightening how many technical people don't understand
           | this; the Web is intentionally engineered to serve as "the
           | infrastructure for handling humanity's publishing needs
           | indefinitely"[2]. More frightening is that the biggest threat
           | to this are people like the author here who treat the Web as
           | if it's like any other thing that the computing industry puts
           | out--i.e., already perennially broken. This is dangerous
           | because it anachronistically cedes power to folks who'd try
           | to argue at some point in the future that the things about
           | the Web that they'd like to break (and might be in a position
           | to break e.g. due to browser monopoly) are justified and no
           | big deal, really.
           | 
           | The author goes on to call out the Web ("of rubbish") as
           | "user-hostile". Shortly afterward, he or she writes that "PDF
           | makes a stand against the churn". More accurately, PDF makes
           | a stand against the user, by prioritizing authors' creative
           | whims over the reader's needs. This happens again later in
           | their remarks about PDFs being page-oriented: "you are
           | fundamentally not in control of the reading experience." The
           | "you" here is not you, the actual reader. The control they
           | refer to is, once again, the author's.
           | 
           | You get other poor arguments--that PDFs are "offlineable"
           | "files" that can be distributed "decentralized", none of
           | which are accurate criticisms against what HTML lacks--unless
           | those Java documentation zipballs that seemingly every
           | university student enrolled in a CS program in the early
           | 2000s was made to download are a collective hallucination.
           | 
           | And it gets worse from there. Cute stunt to grab attention
           | and all, but the arguments are fundamentally bankrupt.
           | 
           | 1. http://cr.yp.to/qhasm/literature.html
           | 
           | 2. https://news.ycombinator.com/item?id=27368632
        
             | the_other wrote:
             | Thank you for this detailed response!!
        
           | account42 wrote:
           | > it even displays fine on devices like tablets and phones
           | that did not even exist 20 years ago
           | 
           | It would display perfectly if mobile browsers didn't have
           | broken defaults (to work around broken websites) that you
           | need to disable using <meta name="viewport"
           | content="width=device-width, initial-scale=1">.
        
         | austincheney wrote:
         | That's a hard sell. The churn exists because people want it,
         | not end users, but people who are paid to produce websites.
         | 
         | Most churn comes in two flavors:
         | 
         | * analytics and spyware
         | 
         | * convenience code for insecure developers
        
       | duxup wrote:
       | My kids school used to send links to google docs for their
       | announcements, I hated it. I pretty much hate any system like
       | that, it's purely extra steps on the web.
       | 
       | In both email, and the browser I'm already in a program that
       | displays text and images and cool stuff. So then I'm just sent a
       | link to someplace else that does the same thing?
       | 
       | So then what? Is it all just "pdf can do that too", but with
       | extra steps...? I can print to PDF in most browsers if I want,
       | but in this case it isn't a choice.
       | 
       | The idea that I might save and store the school emails or that
       | website and somehow manage those files seems kinda self important
       | in a way ... I don't mean that as a personal attack, just that
       | this idea that they imagine me taking the time to do that with
       | their content? When otherwise it could have just been an
       | accessible web page? How many people care to do that?
       | 
       | If I'm visiting a website I'm almost certainly not interested in
       | saving your content / managing it... almost never.
       | 
       | I'm a little lost on the whole 'page-oriented' idea too. That's
       | just a limitation of paper, and it's a pain / disruptive more
       | often than not. Even the 'page oriented' section is broken up by
       | the page and some extra text at the bottom of the page that is
       | irrelevant to the paragraph...
       | 
       | If folks want a 'save to pdf' option might be nice to add, or the
       | user can just print to pdf...
        
       | cunthorpe wrote:
       | Please somebody bake an icon into the browser that turns green
       | when websites are lightweight and content-only and make it affect
       | Google rankings.
       | 
       | We don't need PDF sites, we need incentives for publishing
       | acceptable websites.
       | 
       | Side note: I'd honestly love for the government to step in and
       | outright outlaw some obvious and intentional dark patterns
       | (example: California unsubscribe law)
        
         | titzer wrote:
         | > make it affect Google rankings.
         | 
         | Google is never going to make a change to its rankings that
         | interferes with its real goal of 23% YoY revenue growth.
        
           | blacktriangle wrote:
           | Is that actually an internal Google goal? If so, dear god, no
           | wonder they are so willing to sacrifice the long term health
           | of the internet in return for short term hypergrowth. No
           | company Google's size can grow that fast without some serious
           | dark patterns and user abuse.
        
             | titzer wrote:
             | You don't end up with that level of growth year over year
             | for 20 years straight _by accident_. It is an unwritten
             | assumption that missing 20% growth is a fail. I worked at
             | Google almost 10 years and watched the dog and pony show
             | (aka TGIF) from the inside. The real story is on the
             | quarterly financial reports.
        
       | janandonly wrote:
       | PDF-fing everything on your website is one way to go about it...
       | 
       | I personally use the service at printfriendly [1] and Arc90's
       | Readability to make un-crufted and readable PDF files of web
       | content that is worth saving for the coming decades. Added bonus:
       | by saving these very small files on my system pressing the
       | Command + Spacebar on my system I can easily search through my
       | multiple decades of interesting files...
       | 
       | [1] https://www.printfriendly.com [2]
       | https://ejucovy.github.io/readability/
        
       | afavour wrote:
       | Didn't expect I'd see a top post on HN _defending_ the page-
       | centric nature of PDFs. A pager format is awful for anything
       | other than printing out pages.
       | 
       | But hey, it's a big wide web, you do you. But I won't be reading.
        
       | 0xcoffee wrote:
       | Excellent! Excited to see the next PDF generator framework.
        
       | msoad wrote:
       | Company's S-1 documents are shared on Hacker News. SEC publishes
       | them in both PDF and HTML. Guess which one works better?
       | 
       | It's not the fault of HTML standard if people are using React
       | plus 20 different libraries for a simple static content
        
       | 8note wrote:
       | "PDFs are self contained, and can't be broken by an API going
       | down"
       | 
       | Is directly broken by "PDFs are part of the web, and part of the
       | content can be by reference to a webpage"
       | 
       | If that webpage goes down, that link it broken.
       | 
       | That decentralized bit still needs to conform to broken copyright
       | laws too.
       | 
       | You can't just download a pdf then rehost it on your own without
       | a license to do so
       | 
       | .... There's also a big difference between a city and the modern
       | web. We own the infrastructure in a city, vs rich people own it
       | on the web.
       | 
       | Rather than a city, the web is more like a company town. I don't
       | think that's any different for pdfs either. The distribution is
       | still coming from a web server owned by a company -- the real
       | response is self hosting of your stuff, and self hosting by your
       | friends for their stuff. The file format doesn't make it self
       | hosted
        
       | [deleted]
        
       | greatgib wrote:
       | What is the summary?
       | 
       | Same as someone else, to read on mobile I have to download and
       | open a pdf so i just cancelled the download and ignored the link
        
         | prox wrote:
         | What is the bump you experience that you don't want to download
         | and open a pdf? Here it opens in my browser directly (Safari)
        
           | lucideer wrote:
           | macos/ios have this built in but not all OSes come with a pdf
           | viewer
        
           | TheFreim wrote:
           | For me all my readers (I have multiple on my phone) all can't
           | open the file for some reason.
        
           | kzrdude wrote:
           | It ends up in the downloads folder and needs cleanup later.
        
             | codetrotter wrote:
             | In all browsers that I use that is only true if the server
             | sends a Content-Disposition header with its value set to
             | "attachment" (optionally with a file name), or maybe also
             | in the case where the server specifies incorrect or
             | unspecific Content-Type (such as simply "application/octet-
             | stream" instead of "application/pdf").
        
               | kzrdude wrote:
               | What I said happens for Firefox on android,
               | unfortunately. It's a great browser, of course.
        
               | alisonkisk wrote:
               | Even on mobile?
        
               | codetrotter wrote:
               | Yes. I use Safari on iOS.
        
               | sharken wrote:
               | On the Brave browser for Android it also downloads the
               | PDF file and stores it locally. Websites should use HTML
               | and not PDF in my opinion.
               | 
               | On top of that the end result is not very readable on
               | mobile, the font is too small.
        
               | codetrotter wrote:
               | > [...] Websites should use HTML and not PDF in my
               | opinion.
               | 
               | > On top of that the end result is not very readable on
               | mobile, the font is too small.
               | 
               | Agreed on both counts. Was only commenting about browsers
               | saving PDFs.
               | 
               | PDF is not a comfortable format for reading on a screen.
               | Nor a comfortable format to extract text or data from.
        
       | admax88q wrote:
       | Well this sucked to read on mobile. I'll stick to HTML.
        
       | kissgyorgy wrote:
       | It's a terrible "implementation", but interesting observations we
       | should consider.
        
       | prox wrote:
       | I love the basic idea here. Needs polishing if you want to blow
       | this up to the masses.
       | 
       | It's like my Pi who just does one thing really well, and allows
       | me to tinker on every level if I so choose.
        
         | prox wrote:
         | I like to add that I think a well designed PDF is just so much
         | better looking than any html based page (and has a lot more
         | freedom)
        
           | pasc1878 wrote:
           | Definitely less freedom. On html I the reader can change the
           | size of text or even the font and the text will reflow so you
           | don't need scroll horizontally to read each line. How do you
           | do that to a pdf?
        
             | prox wrote:
             | That's not what I mean (your point has merit)
             | 
             | If I ask a designer to design a website, he has to send it
             | of for implementation, or is confined by html
             | breakpoint/accessibility options.
             | 
             | PDF can go straight from designer to document and do
             | everything in a program like designer, indesign and so on.
             | 
             | It's a designer first paradigm.
        
       | atemerev wrote:
       | "But stable standards are incredibly important.They allow
       | software, at least in theory, to be finished. Why is it
       | importantthat software be finished? Because it gives us hope that
       | we might end thechurn and fix all the bugs! I want to use
       | software whose version number is7 1.0. I want to use software
       | whose every line of code has been studied,analysed, optimised and
       | punishingly tested. I want every component andsubcomponent and
       | every interaction and every configuration to beexquisitely
       | documented, and taught in courses, and painstakinglydeconstructed
       | and proven sound"
       | 
       | Sorry, not possible. Never, ever. Software does not work like
       | that. Bugs will never be fixed (if they could, the software in
       | question would have become obsolete long ago). By the way, this
       | is what you get when you try to copypaste text from this
       | "website".
        
       | fsiefken wrote:
       | Very good, I go for project gemini
       | https://gemini.circumlunar.space/docs/faq.gmi
        
       | MichalSternik wrote:
       | Well, what's wrong with static site (generators)?
       | 
       | I certainly get the argument, but using something like hugo or
       | gatsby or jekyll when you want to avoid the "churn" also seems
       | like a perfectly valid solution.
        
         | nonameiguess wrote:
         | The author addresses this pretty well. Because you can embed
         | whatever you want, static site generators aren't really static.
         | In particular, Jekyll blogs and what not still pretty commonly
         | include comment sections.
         | 
         | Of course, pdfs aren't necessarily static, either, but that is
         | why Lab6 is choosing to use pdf/a, an actually static format
         | intended specifically for long-term archiving of immutable
         | files. This way you can sign the file and guarantee it stays
         | the same forever and everyone's copy is identical.
         | 
         | I'm kind of surprised at the response to this. The author seems
         | well aware of how terrible pdf is as a format and this isn't
         | some treatise of why we should want to use it. It's an
         | unfortunate compromise that, given the requirements they're
         | aiming to meet, of generating a file that supports rich
         | formatting and hyperlink embedding, but which can guarantee
         | immutability and long-term archiving directly in the spec,
         | pdf/a is all there is, so in spite of being a terrible format
         | with a lot of shortcomings, it's what they're using.
        
           | account42 wrote:
           | > The author addresses this pretty well. Because you can
           | embed whatever you want, static site generators aren't really
           | static. In particular, Jekyll blogs and what not still pretty
           | commonly include comment sections.
           | 
           | But just like you can choose to use PDF/A, you can also
           | choose to have a completely static and self-contained (e.g.
           | using data URLs for images) HTML page.
        
           | IshKebab wrote:
           | Why don't they just use a static subset of HTML? You don't
           | _have_ to include comments sections, just like you don 't
           | _have_ to include 3D CAD models and videos in your PDFs (yes
           | you can do both of those, in theory anyway).
        
           | danShumway wrote:
           | > pdf/a is all there is
           | 
           | Nobody is requiring you to use PDF/A. No mainline browser
           | (that I'm aware of) requires it.
           | 
           | So what is being solved? When I click on a PDF on the web, I
           | don't know if it's using PDF/A, I don't know if it's
           | embedding or linking its fonts. So it's the same situation,
           | nothing has changed.
           | 
           | Telling people to use PDF/A when most clients do not enforce
           | it and when there's no indication to users before they click
           | on a link whether or not the link is following the spec -- it
           | is exactly the same as telling them to use a subset of HTML;
           | the author is doing the same thing they complain about.
           | 
           | You can't just say that PDF/A exists. That's not enough, how
           | will you get people to restrict themselves to that format
           | when 99% of their users will never notice the difference and
           | no client is enforcing it?
        
         | float4 wrote:
         | The only thing I like about PDF compared to HTML is that with
         | PDF, I know for a fact that no web requests are made in the
         | background. That means no fingerprinting, no analytics etc.
         | 
         | With HTML, I have to trust that some random entity does what
         | they state in their privacy policy, and they regularly don't.
         | Sure, I can disable JS, but then 95% of the web doesn't work
         | anymore.
         | 
         | Other than that PDF is quite clearly a less accessible format.
        
           | account42 wrote:
           | > With HTML, I have to trust that some random entity does
           | what they state in their privacy policy, and they regularly
           | don't. Sure, I can disable JS, but then 95% of the web
           | doesn't work anymore.
           | 
           | If you only allow PDF, then 99.9999% of the web doesn't work
           | anymore.
           | 
           | I'm all for getting sites to be static, but PDF doesn't fix
           | that because the problem has never been the technology used
           | to build the site.
        
           | grncdr wrote:
           | Are you sure? I was under the impression that PDFs can
           | reference web resources, and this is why there are more
           | stringent standards for archiving (PDF/A and friends)
        
           | andrepd wrote:
           | You can not use js on your website.
        
           | jefftk wrote:
           | How sure are you that there are no network requests
           | happening? I tried to look this up and wasn't able to find
           | any clear answer.
           | 
           | (It looks like at least some PDF readers have provided
           | support for automatically displaying external images, for
           | example)
        
             | foobar33333 wrote:
             | The full PDF spec is insane and allows for web requests and
             | javascript. Most readers do not implement the anti features
             | but adobe's tools will.
        
           | robin_reala wrote:
           | How do you know for a fact? PDF has JS in the spec, and it
           | supports SOAP and Web Services. Have a look at
           | https://www.adobe.com/go/acrobatsdk_jsdevguide
        
             | float4 wrote:
             | That's not the PDF spec is it? That is a spec for Adobe
             | Acrobat, which is not allowed to make any web requests
             | thanks to my application firewall (Little Snitch).
             | 
             | Pretty sure a PDF opened in the browser can't run any JS,
             | but not completely sure. So you're right: I don't really
             | know it for a fact. Poor choice of words.
        
               | robin_reala wrote:
               | The spec is ISO 32000, and it's expensive and closed, so
               | difficult to reference. But according to Wikipedia at
               | least, JavaScript is normative in it. No idea if SOAP /
               | Web Services is part of it though.
        
               | jl6 wrote:
               | The spec for PDF 1.7 is here: https://www.adobe.com/conte
               | nt/dam/acom/en/devnet/pdf/pdfs/PD...
               | 
               | JavaScript is allowed, but not in PDF/A, which is what I
               | use.
               | 
               | The PDF 2.0 spec is damnably not public.
        
               | the8472 wrote:
               | But you can't easily tell PDF/A and regular PDF apart, so
               | we're back to the same situation as HTML vs. HTML with
               | javascript turned off.
        
           | deregulateMed wrote:
           | You are fingerprinted when you find the web link.
        
             | float4 wrote:
             | When I click a link you mean? Definitely true, but that way
             | they only have access to my IP and user agent, which is
             | still better than all the WebGL, Font library, display
             | calibration settings, mouse movement etc. that they use
             | otherwise.
             | 
             | I often use Tor, although I'm pretty sure that even then, a
             | good analytics lib can see it's me based on scroll
             | behaviour, mouse movement, time of day, and of course what
             | I browse.
             | 
             | But yeah, you make a good point.
        
               | deregulateMed wrote:
               | Where do you get the link?
        
               | float4 wrote:
               | DDG mostly, and they don't track users.
        
               | deregulateMed wrote:
               | Your device, your device version, screen size, browser,
               | browser version, IP address, etc... Are all tracked
               | regardless.
               | 
               | You might not be a unique fingerprint, but at best you
               | are part of a group of somewhere between 3 and 1000
               | similar users.
               | 
               | Not to be a downer, but when I webscraped I learned that
               | big corporations can spend money to fingerprint you.
        
             | andrepd wrote:
             | Why?
        
         | throw0101a wrote:
         | > _I certainly get the argument, but using something like hugo
         | or gatsby or jekyll_ [...]
         | 
         | Or a plug-in to Wordpress so you can keep the GUI/dynamic for
         | the less technical employees:
         | 
         | * https://wordpress.org/plugins/simply-static/
        
       | 101008 wrote:
       | I've been doing something similar for 4 years now. I converted my
       | niche website into a monthly magazine, that is released as a PDF
       | (and also uploaded to Issuu).
       | 
       | It has its good sides and bad sides. People will download the PDF
       | every month when there is a new issue, but you don't know if they
       | read it, how much time they spend on it, etc. You won't appear on
       | Google Results as you would do if you posted the articles as
       | HTML, etc.
       | 
       | Based on my experience, I just keep doing it as an experiment and
       | because I enjoy saying I run a digital magazine, but the true is
       | that there is no real advantages on it.
        
         | jl6 wrote:
         | > you don't know if they read it, how much time they spend on
         | it, etc.
         | 
         | This is an excellent feature, for the user.
        
           | 101008 wrote:
           | Yes, for the user it has some advantages:
           | 
           | . Download it and keep it forever. . Read offline. . Be able
           | to share it through email, etc . Print it and read it in a
           | nice place! (I encourage this)
           | 
           | Of course, it has some downsides: . No responsive, so people
           | who download it from a phone may hate it. . No accesibility.
        
       | croes wrote:
       | Useless rant. His choice won't change the rest of the internet
       | and for his site he could easily write lean html without all the
       | stuff he complains about.
        
       | LightG wrote:
       | Appreciated the sentiment of it.
       | 
       | It's not ideal, but in a non-ideal world where the big boys have
       | ruined the web, I tip my hat to this effort with a large dose of
       | empathy.
       | 
       | Cheers,
        
       | eloisius wrote:
       | I'm not old enough to remember Gopher being "the internet" but I
       | have browsed a few retro sites that still run it. I wouldn't mind
       | seeing some slightly upgraded gopher-like protocol that allowed
       | for embedding images and maybe form submissions (without any
       | scripting). Most of what I want to do online is read, and I'd be
       | more than happy for everything to come with a standardized look
       | and feel rather than whatever scroll jacking weirdo design every
       | website feels like having.
        
       | JorgeGT wrote:
       | While this may be extreme, I do notice that it is becoming harder
       | and harder to print webpages to PDF/paper. Is there a good
       | approach for this besides the standard print dialog?
        
         | kuu wrote:
         | Maybe use the read mode of Firefox and then print it?
        
         | bigyikes wrote:
         | For sites without print-specific media queries (so basically
         | all websites) I use dev tools to delete all the DOM nodes I
         | don't want to appear in print.
        
       | pharke wrote:
       | Isn't this what IPFS is for?
        
         | knownjorbist wrote:
         | I'm surprised that IPFS and others aren't mentioned more here.
         | The solution is staring us in the face, it's related to
         | cryptography.
        
       | throwawayswede wrote:
       | While I appreciate the sentiment, I don't think PDF is the way,
       | at least in the way you're currently doing it. PDF maybe
       | supported by browsers, but they're not intended for it, it's
       | secondary feature. Same for search engines. Same for mobile.
       | 
       | Most browsers have Print to PDF. If you want people to be able to
       | download an immutable version of your content, then just have a
       | simple static version of your page with a valid print css, better
       | yet, leave everything default.
       | 
       | If you want to fight churn with PDF, just have a simple HTML
       | website with a link to download a versioned PDF of your issue.
       | Your website can be as simple as
       | https://motherfuckingwebsite.com/ or
       | https://bettermotherfuckingwebsite.com.
        
         | grumblenum wrote:
         | There are also other lightweight alternatives. The Gopher
         | protocol has a small, but disturbed following :
         | http://gopher.muffinlabs.com/gopher.floodgap.com (you can
         | actually use netcat as your gopher client). Gemini is a more
         | modern gopher-inspired protocol
         | https://gemini.circumlunar.space/. Personally, I'd be pleased
         | to see a text-first approach gain adoption. I don't think
         | anyone looks at the thick-client model browsers have evolved
         | into and sees an optimal solution.
         | 
         | I think evangelistic energy should probably be directed at
         | complaining to organizations that share content through JS-
         | framework monstrosities. Getting rank-and-file web-devs excited
         | about lean websites doesn't hurt, but clients and CTOs have
         | real decision making power.
        
       | csomar wrote:
       | It's ironical that the author is pitching for PDF, and yet he is
       | using a plethora of hyper-links.
       | 
       | The big "invention" of the Web was linking pages together. That's
       | what made it great. That's what created "Google" in the first
       | place. Links in a PDF are supposed to take you to a browser or
       | open a different PDF file?
       | 
       | PDF is a step back. If you are angry about the overblown size of
       | JavaScript and resources consumption, use a simple static
       | website. It doesn't get easier than that.
        
         | alisonkisk wrote:
         | You're conflating browsers with markup language. Clicking an
         | HTML link opens a different HTML file.
        
       | PaulHoule wrote:
       | PDF has quite the attack surface. It supports Javascript, 3D
       | models, JBIG2 compression that turns 8's into 6's and all sorts
       | of strange things.
        
       | api wrote:
       | The point about the size of the W3C spec is hilarious, but I
       | wonder how much of that hundred million plus words is actually
       | necessary to implement the parts of the spec that people use?
       | 
       | Surely it would be possible to create a spec that captured the
       | most useful subset of HTML and CSS functionality.
       | 
       | In any case if the spec really is that huge the W3C should be
       | written off. Any organization that produces a spec like that is
       | worthless.
        
       | dalbasal wrote:
       | I found _" PDFs are files"_ kind of compelling. Perhaps this was
       | a flaw of the original www concept. Web pages were always
       | technically files & documents, but this was always abstracted
       | away from userland. "Save webpage" was never a core feature. This
       | did disempower users.
       | 
       | PDFs are downloaded, saved, emailed around. They can also be
       | linked to. Userland maintains a closer relationship with what's
       | going on. A typical user know that you can have a copy of a file,
       | which may or may not be identical to the online one. WWW, from
       | its initial version, was mysterious. The transition between the
       | model of requesting files from a server by clicking a link to a
       | programmatically generated stream of code executed on your
       | browser happened below typical users perspective.
       | 
       | The wb has obviously gained a lot, but has also lost something.
        
         | BenjiWiebe wrote:
         | I've definitely used saved webpages a lot. When we had dialup
         | email only, my dad would drive to the library with a flash
         | drive and download Web pages to bring home and read. It was
         | great. Of course, it's even greater now that I can load it
         | fresh even faster.
        
       | Aeolun wrote:
       | All of the stuff he says PDF is, is the same for HTML.
        
         | zeusk wrote:
         | Well, sort of. Can't HTML contain script tags with external
         | references (xmlHttpRequest or any async fetch) that a simple
         | crawler/browser may not save to disk?
        
           | wffurr wrote:
           | It can, but you don't have to. It's absolutely possible to
           | write self contained html files.
        
           | wccrawford wrote:
           | They could, but if he's the one create the file, he can
           | choose. And if he's just hosting the file, I'm sure there are
           | tools that will inline all the external resources.
        
       | vimy wrote:
       | Reading PDFs on a phone isn't an enjoyable experience.
        
         | deregulateMed wrote:
         | For books, I prefer it to Libby and Google Books.
         | 
         | There are tons of pdf viewers to choose from, so if you don't
         | like an App, there are more available.
         | 
         | I like that mine remembers the last opened doc and page. I can
         | copy text from pdf too.
         | 
         | Although this isn't a comparison of ebook to pdf, it's html to
         | pdf.
        
       | ColinWright wrote:
       | For reference, the original title was:                 We are
       | drowning in churn and noise.       I am fighting by switching
       | this       site to PDF
       | 
       | I find the "actual" title unhelpful, unenlightening,
       | uninformative, and uninviting, which I why I originally chose
       | text taken directly from the page, so people would know what it
       | was about before taking the time to click and read.
       | 
       | I know _why_ the HN mods have changed it to  "Deurbanising the
       | Web", but I wish they'd keep more informative titles, especially
       | when taken from the article in question.
        
         | failwhaleshark wrote:
         | I didn't understand what it meant. I thought it was a euphemism
         | for gentrification. None of their knee-jerk, dilettante, low-
         | effort rant clearly identifies what they're really mad about,
         | or if they're just mad to be mad.
         | 
         | It feels like a waste of everyone's time.
        
       | dahfizz wrote:
       | If this catches on, there will be "JS in PDF" in no time.
        
         | MawKKe wrote:
         | as in "it exists already"?
        
         | [deleted]
        
       | pseingatl wrote:
       | Most people think that pdf's have to be letter or A4 size, but
       | you can make them at A7 or A8 for a phone screen, or for that
       | matter, any size you want.
       | 
       | PDF is size-agnostic. There's nothing to stop you from creating
       | documents the size of a phone screen. So you could put the phone
       | screen-sized pdf at m.mysite.com and this small screen illegible
       | complaint is solved.
        
         | dredmorbius wrote:
         | The site would be inspired to automatically detect device sizes
         | (JS or CSS media queries) and offer an appropriately-scaled PDF
         | download option.
         | 
         | Unfortunatly it didn't opt for that.
        
       | SMAAART wrote:
       | Well, that's innovative.
       | 
       | but, why not HTML 2?
        
       | emptyfile wrote:
       | Instead of writing text let me make some more noise by shoving
       | PDFs for no reason.
        
       | schipplock wrote:
       | The text is too small to read on my phone. I can zoom in, but
       | then I have to scroll horizontally. I'm afraid this website isn't
       | targetting me.
        
       | rerx wrote:
       | When I click on the submitted link with Chrome on Android, it
       | asks me if I want to redownload "0.pdf". Such a confusing
       | question. If I pick the wrong answer, I end up with some
       | restaurant menu I must have looked at months ago, not what the
       | global poster intended.
       | 
       | So for non-confusing real-world UX I'd recommend extra care with
       | file names if you want to go PDF only.
        
       | pornel wrote:
       | Maybe the author doesn't realize how difficult PDF is to work
       | with. In PDF it's ambiguous whether any two spans of text belong
       | together in the same sentence or paragraph. It can even be
       | unclear where are _spaces_ between words. PDF also allows
       | "optimizing" font usage that makes text unreadable without OCR-
       | ing the custom font. The messy hacks go on and on:
       | 
       | https://filingdb.com/b/pdf-text-extraction
       | 
       | OTOH it's totally possible to make a self-contained HTML page
       | without using a JS framework of the day. It's going to be way
       | easier to consume than a PDF.
        
         | Recursing wrote:
         | From the PDF
         | 
         | > "But it's just as easy to write self-contained HTML pages!"
         | 
         | > Sure, but if you're going to hide CTF forensics challenges in
         | your publication, a coverdisk allows you to do it in style!
         | 
         | I think it's not meant to be taken extremely seriously
        
         | jl6 wrote:
         | Hello. Original author here.
         | 
         | I do realize how ugly PDFs are to work with (I wrote my own
         | PDF/A generator for issue 2[2]). This is a Tagged PDF though,
         | so you can extract text using standard tools.
         | 
         | To understand the mindset, have a read of the Gemini FAQ[0],
         | specifically the answer to why not use a subset of HTML - and
         | then read Issue 2[2] which is a hybrid Gemini+PDF polyglot, for
         | people who don't like reading PDFs, which is apparently
         | everyone on this thread :)
         | 
         | Issue 1[1] also moves beyond PDF, to try addressing some of the
         | accessibility shortcomings by (a) prepending the content as
         | plain text, and (b) _recording myself reading the whole thing
         | out_ and arranging the file as a polyglot MP3 and PDF file that
         | can be played in an audio player as well as viewed in a PDF
         | reader as well as a text editor.
         | 
         | A mini-FAQ to address some points elsewhere in the thread:
         | 
         | * No, it's not going to replace your blog or the web in
         | general.
         | 
         | * Yes, it's an experimental art project / longitudinal CTF
         | forensics tournament / weirdo personal blog.
         | 
         | * Yes, I'm serious anyway.
         | 
         | [0] https://gemini.circumlunar.space/docs/faq.gmi
         | 
         | [1] https://lab6.com/1
         | 
         | [2] https://lab6.com/2
        
           | ReactiveJelly wrote:
           | > The problem is that deciding upon a strictly limited subset
           | of HTTP and HTML, slapping a label on it and calling it a day
           | would do almost nothing to create a clearly demarcated space
           | where people can go to consume _only_ that kind of content in
           | _only_ that kind of way. It 's impossible to know in advance
           | whether what's on the other side of a https:// URL will be
           | within the subset or outside it. It's very tedious to verify
           | that a website claiming to use only the subset actually does,
           | as many of the features we want to avoid are invisible (but
           | not harmless!) to the user
           | 
           | But I don't really know that your PDF website doesn't use
           | some evil invisible PDF feature.
           | 
           | And I have to use a special Gemini browser to access Gemini
           | pages. (Since an HTTPS bridge misses the point)
           | 
           | So why not use Dillo as my "Sane subset of HTML"? It is not
           | hard to hand-write HTML that looks great in Lynx, Dillo, and
           | Firefox.
        
           | prox wrote:
           | I think the idea of PDFs opens up many new possibilities, and
           | your work is quite an eye opener. Design is largely missing
           | from websites - it's the same design over and over when it
           | comes to optimizing for clicks.
           | 
           | Designers would thrive in a PDF environment instead of
           | handing their designs over to implementation as it is now.
           | 
           | Maybe PDF is just the beginning and maybe a similar format
           | can be thought up that addresses some of the concerns
           | expressed here, and move over in time.
        
           | sosuke wrote:
           | I've spent entirely too much time "printing" sites and
           | articles to PDF to save them to read or reference later. Your
           | PDF style was perfect! No need to fuss with anything just
           | save it!
        
             | slashdot2008 wrote:
             | This thread might be helpful to you
             | https://news.ycombinator.com/item?id=27817659
        
           | jrochkind1 wrote:
           | I don't like reading PDFs and probably wouldn't read much of
           | your website like that... but I appreciate the intervention
           | drawing our attention to the advantages of PDFs in the
           | disadvantaged present environment, which I think are real and
           | worth thinking about. It seems almost like an artistic
           | project. I'm not mad at you, and am not sure what makes some
           | people seem to be so mad here (probably means you were
           | succesful at something)... but I'm still not gonna read it,
           | PDFs are a mess to read!
        
         | morsch wrote:
         | Case in point: copy-pasting a paragraph from his PDF-website
         | adds line breaks everywhere. It also loses formatting
         | (bold/italics) and the footnote superscript doesn't translate.
         | PDF is an open standard, which is freely available2, and
         | stable. It has a        version number  and many interoperable
         | implementations including        free and open source readers
         | and editors.
         | 
         | I think ease of copy-pasting is one of the coolest things about
         | the document-centric roots of the web (along with the back
         | button and hyperlinks; in other words, hypertext rules),
         | although the modern web does break it (along with the back
         | button and hyperlinks) in many places, so I can see where he is
         | coming from. PDFs aren't the answer, though.
        
         | signal11 wrote:
         | > it's totally possible to make a self-contained HTML page
         | without using a JS framework of the day. It's going to be way
         | easier to consume than a PDF.
         | 
         | Completely agree. For instance, NASA's APOD site[1] is a good
         | example of something that'd be nontrivial using both an offline
         | PDF and modern lightweight alternatives like Gemini, but works
         | really well even without fancy modern design. Under 300kB
         | including the image (HTML's under 6 kB) _before_ gzipping.
         | 
         | [1] https://apod.nasa.gov/apod/astropix.html
        
         | lifthrasiir wrote:
         | > OTOH it's totally possible to make a self-contained HTML page
         | without using a JS framework of the day.
         | 
         | I'm basically in agreement, but the author has a good point
         | that PDF is obviously self-contained and self-contained HTML
         | pages are not necessarily distinguishable from those that
         | aren't. Perhaps we might have to revisit MHTML or embrace Web
         | bundles as an alternative to PDF.
        
           | IshKebab wrote:
           | Or... AMP? But no, Google made that so it must be a bad idea.
        
           | dtech wrote:
           | It's not even JS. I'd argue a HTML + inline JS page is a lot
           | more self-contained than one with external images, videos and
           | fonts.
           | 
           | Note that PDFs can contain JS too.
        
             | Digit-Al wrote:
             | > Note that PDFs can contain JS too.
             | 
             | That's why he says to use PDF/A, which can't contain JS.
        
             | imglorp wrote:
             | > Note that PDFs can contain JS too.
             | 
             | Wait, why?!? When does it render? Who's supposed to have a
             | js engine to do that? What version? How does it load
             | dependencies? Is HTML and DOM carried along with it? So
             | many questions.
        
               | maskros wrote:
               | Why? To validate form fields.
               | 
               | Who? The PDF viewer.
               | 
               | When? Since about 2000 in PDF format version 1.3.
               | 
               | Dependencies? Hah, no such luck. You're stuck with ES5
               | and Adobe's crufty JS library. There is no HTML and DOM,
               | there are however some pretty thorough PDF document
               | bindings.
        
               | native_samples wrote:
               | Why - because scripting is useful. A big use of PDFs is
               | translating paper forms into digital forms without
               | needing to make a web app out of them. JS is used for
               | client side validation, same reason it was put into
               | browsers. Acrobat can handle this along with _many_ other
               | features that most PDF readers can 't handle properly.
               | 
               | Basically in the PDF world, Acrobat Reader is Chrome and
               | everything else is, like, Konqueror or something. Don't
               | be fooled into thinking PDF is a small spec. It's not.
        
           | cxr wrote:
           | You want PWP <https://blog.jonudell.net/2016/10/15/from-pdf-
           | to-pwp-a-visio...> (Later aborted, and the group's work was
           | rolled into EPUB3. As you note, there remains a genuine need
           | for it.)
           | 
           | On the other hand, there's nothing stopping you from using a
           | double-barrelled file extension for denoting this sort of
           | thing, e.g. "memex-opus.pub.html"; so long as it ends with
           | something recognizable, double-clicking should still open it
           | in the browser across all the usual platforms, AFAIK.
           | 
           | (I'm fond of using "xyzzy.app.htm" myself to take advantage
           | of this trick for distributing simple, self-contained
           | programs that are designed run in the browser.)
        
             | 7sidedmarble wrote:
             | This is what PWAs are kind of for.
        
         | dalbasal wrote:
         | The author addresses this: " _We choose to switch to PDF in
         | this decade, not because it easy, but because it is hard" -
         | John F. Warnock, September 12th 1962_ "
         | 
         | The author is obviously making a statements, exploring ideas...
         | not searching for an actual solution to his use case.
        
           | jl6 wrote:
           | Yeah, it's kinda embarrassing that the one quote that gets
           | pulled out in the HN commentary is the one that contains a
           | typo. It's OK: Issue 1[0] contains a patch to fix the issue.
           | 
           | [0] https://www.lab6.com/1
        
           | rkachowski wrote:
           | Is this a comical misquote or is the PDF format actually 60
           | years old?
        
             | wccrawford wrote:
             | It's a comical, deliberate misquote.
        
             | SethMurphy wrote:
             | Comical misquote, "Switch to PDF" replaced "Go to the
             | Moon".
        
             | dalbasal wrote:
             | Its comical, but links to the founder of Adobe. IDK what
             | the date alludes to.
        
               | fmajid wrote:
               | JFK announcing the US would put a man on the Moon before
               | the decade was over.
        
               | dalbasal wrote:
               | oh... yeah
        
             | coldtea wrote:
             | It's about 30 years old - it's creator however is said
             | person.
             | 
             | The actual quote was from JFK iirc regarding the Apollo
             | missions...
        
       | Ajedi32 wrote:
       | This is an awful idea and I love it.
       | 
       | As others have pointed out it's strictly worse than a static HTML
       | site in many, many ways. At the same time though, it's a
       | brilliant criticism of many of the worst aspects of the modern
       | web.
       | 
       | This is art.
        
       | ussrlongbow wrote:
       | Very surprised to see just few comments mentioning EPUB, which is
       | IMO is much more suitable for document-centric approach. An open
       | standard with freely available[1] specification and never had any
       | problems with EPUBs on PC, tablets and phones.
       | 
       | [1] - https://www.w3.org/publishing/epub32/epub-spec.html#sec-
       | intr...
        
         | kccqzy wrote:
         | But can you open an epub in a browser? That's the main point
         | here.
        
           | spinax wrote:
           | Not only simple browser plugins per the other reply (and a
           | plethora of non-crashing mobile apps, whereas mobile PDF
           | reading apps crash on me all the time) - the ePub format is
           | just a zip file in disguise with plain text (HTML) inside and
           | maybe some images/etc.
           | 
           | In a manner of speaking, ePub as a design has an inherent
           | built-in fallback mechanism to manually obtain the internal
           | content in case of failure - including ability to try and
           | repair a broken zip format (zip -F/-FF) and grep it in place
           | (zipgrep).
        
           | ussrlongbow wrote:
           | Yes, but with the plugin
        
         | shuntress wrote:
         | Also worth pointing out, EPUBs are (or, at least, can be. I'm
         | not sure how much flexibility is in the specifications)
         | basically just bundled HTML.
        
         | DerDangDerDang wrote:
         | There's a fixed layout version of the ePub standard too,
         | allowing PDF quality if that's what you're after.
        
       | bambax wrote:
       | He should offer PDF _in addition to_ basic HTML, not as a
       | replacement.
        
       | keiferski wrote:
       | I think if one designed a "crisis-proof" version of the web, it
       | might end up being a network of PDFs. My reasoning being:
       | 
       | - PDFs are universally understood by most people and can be read
       | on phones, desktops, laptops, and eBook readers.
       | 
       | - Once you've downloaded a local PDF version of the site, there
       | is no risk that it can be changed or removed by the host.
       | 
       | - File size is predictable ahead of time, which is useful if your
       | connection is limited or slow.
       | 
       | - PDFs are designed for printing (moreso than most sites) which
       | may be useful in situations where electricity is in low supply.
        
         | lmm wrote:
         | I set up my blog so that the page source would consist of the
         | original markdown and as little markup as possible to make that
         | render. You can read it with telnet and the experience isn't so
         | much worse than using a browser.
         | 
         | (The actual part that makes this work is a pile of opaque
         | javascript doing all sorts of nasty things at runtime, but such
         | is the way of web pages in today's browsers, I don't worry too
         | much about it).
        
         | Tajnymag wrote:
         | Except the printing part, all can be said about a standalone
         | html file.
         | 
         | External content, like images, can be inlined, thus you would
         | only have to distribute one single .html file.
         | 
         | I'm not sure how would file2file linking work in the realm of
         | pdf files. With html files, it's easy even without any web
         | server.
         | 
         | Plus, html can be even digested through a terminal interface.
         | That cannot be said about the binary nature of pdf documents.
        
           | nulbyte wrote:
           | I use a terminal pager with PDFs quite frequently. It works
           | surprisingly well. Even something you wouldn't expect, like a
           | pay stub, renders fine in the terminal.
        
             | bashinator wrote:
             | What do you use to display the pdf? Pandoc?
        
           | keiferski wrote:
           | This is true, but I do think a PDF is just conceptually
           | simpler and requires less technical knowledge. Especially in
           | a situation where technical users are scarce.
           | 
           | IMO most people have a mental model of a PDF as being a
           | digital document, whereas a HTML file is somewhat more
           | amorphous.
        
         | npteljes wrote:
         | Not sure about your points. Contrast it with a static HTML+CSS
         | website:
         | 
         | - PDFs require a reader, HTML a browser. I wouldn't argue that
         | there are more PDF readers installed than browsers.
         | 
         | - Downloaded static HTML works the same
         | 
         | - File size can be included in the HTTP response: in the
         | Content-Length header
         | 
         | - Printing is nice, but reflowable text is even nicer, since we
         | target a multitude of rendering targets.
        
         | ChrisMarshallNY wrote:
         | PDFs are...not so easy to generate dynamically.
         | 
         | I have done it with a couple of PHP libraries (fpdf and mpdf),
         | but they are primitive, compared to desktop PDF generators. I
         | know that you can use Java (never done that), or
         | even...ugh...XSL (also never done that).
        
           | dredmorbius wrote:
           | Most desktop operating systems offer a print-to-PDF
           | functionality. It's long been an add-on for Microsoft, but
           | that's really a historical accident / deliberate choice of
           | that platform.
           | 
           | PDFs can be trivially created from Markdown or using LaTeX
           | templates if you're looking for a programmatic solution.
           | Pandox and XeLaTex are helpful, the poppler libraries as
           | well. Again, these are generally and widely available at no
           | charge.
        
         | lucideer wrote:
         | > _PDFs are universally understood by most people and can be
         | read on phones, desktops, laptops, and eBook readers._
         | 
         | PDFs need a proprietary app to use, most of which are loaded
         | with spyware & trackers. I may be mistaken in this but
         | MacOS/iOS are the only OSes I know of that read them natively?
         | There's absolutely nothing universal about the format.
         | 
         | HTML is truly universal: not only does every OS come with a
         | built in HTML viewer, but it's a plain text file. You can read
         | the source using anything.
         | 
         | > _Once you've downloaded a local PDF version of the site,
         | there is no risk that it can be changed or removed by the
         | host._
         | 
         | Once you've downloaded a local HTML version of the page there's
         | no risk that it can be changed or removed by the host. Yes,
         | there's caveats to both: people can create PDFs with remote
         | embeds or HTML sites with ajax content but both of these are
         | the fault/responsibility of the individual author. It's as easy
         | to make good downloadable HTML as downloadable PDF.
         | 
         | The so called "churn" is the responsibility of the individual
         | HTML author. If you're making bad HTML, the fix is to start
         | making good HTML. Not to switch to a closed inaccessible
         | format.
        
           | npteljes wrote:
           | PDF is an open format, with multiple FOSS reader
           | implementations. You could argue that a subset of niche
           | features can only be used in Acrobat Reader, but AR is far
           | from the only PDF reader out there.
           | 
           | And the churn is part of the zeitgeist, not really a
           | responsibility of anyone in particular. Individuals are
           | suckered into it, companies are supplying it, and governments
           | are allowing it. We're all part of it. Not new either: I'm
           | hearing it since the 90s how the modern life is rushed, and
           | that's just my limited experience.
        
             | lucideer wrote:
             | I said it wasn't universal, which is somewhat different to
             | the vague idea of being "open", and yes, PDF is technically
             | an "open format" depending on how you define "open". The
             | ISO 32000 spec. costs in the region of ~200 USD/EUR.
             | 
             | What that "openness" translates into in the real world is
             | that there are zero non-Adobe _viewers_ that support all of
             | PDF 's features, and even less PDF editors. The standard
             | PDF editor costs ~200 USD/EUR (annual subscription).
             | 
             | This is before we even get into the nightmarish world of
             | PDF _parsing_. Or PDF accessibility.
             | 
             | PDF is a great format if you're sending a document to
             | someone for them to print immediately. It has no other
             | valid uses imo.
        
               | npteljes wrote:
               | I see your point, thanks for the elaboration. I'm not a
               | fan of the format either.
        
         | foxes wrote:
         | How is a complicated binary better than a literal text file?
         | 
         | Truly absurd, this whole thread is churn.
        
       | mark_l_watson wrote:
       | I also enjoyed the sentiment of the article. I used to blog a lot
       | but in the last decade I have preferred more long form writing.
       | Now I use the leanpub.com [1] service so when I write, I get
       | generated PDF/ePub/Kindle formats, and material is readable
       | online as HTML/CSS. For me leanpub is a way to make content free
       | and accessible, but people can pay if they want. The relatively
       | few people who pay for my material have a large effect on what I
       | decide to write about in the future or which writing projects to
       | drop.
       | 
       | I consume the web mostly by following a few very interesting
       | people on social media and following their links. As an author,
       | my goal is to keep producing interesting enough material to be
       | worth people's time reading.
       | 
       | [1] https://leanpub.com/u/markwatson
        
       | leephillips wrote:
       | We already have a wildly popular website where all the main
       | content is in the form of PDFs. It's https://arxiv.org/. PDF is
       | what you use when your document needs to have a predictable
       | layout. This is especially important if it contains math, complex
       | tables, or any elements where meaning is carried by positioning
       | on the page. This can include aesthetic meaning, as in some forms
       | of poetry that need to be laid out in a particular way.
        
         | dredmorbius wrote:
         | There are several which at least strongly resemble that remark.
         | 
         | Project Gutenberg and the Internet Archive's text archives
         | (along with numerous other document-oriented sites, several of
         | the samizdat variety) offer content in PDF and other document-
         | oriented offline downloadable forats.
         | 
         | Wikipedia has a "save to PDF" link on each article (that seems
         | to work through the browser's capabilities, if any, not all
         | browsers support this). The sister Mediawiki site Wikisource
         | offers ePub downloads.
         | 
         | For longer-form content, PDF, DJVU, and a handful of other
         | formats (arguably ePub) are at least reasonably popular.
        
       | specialist wrote:
       | OC page three:
       | 
       | > _PDFs used to be unreadable on small screens, but now you can
       | reflow them._
       | 
       | Off hand, which PDF viewers do reflow?
        
         | dredmorbius wrote:
         | Foxit, PocketBook Reader, FBReader. Presumably Adobe Acrobat
         | though I've not touched that in a decade or more.
         | 
         | There are also utlities such as the poppler library's pdftotext
         | which will dump ASCII / bare text from at least some PDFs.
        
       | snksnk wrote:
       | Why not use TeX/LaTeX instead and also include a link to the
       | code?
        
         | jimhefferon wrote:
         | The LaTeX below will leave a push-pin symbol in the text, and
         | clicking on it shows the code.
         | \documentclass{article}       \usepackage{attachfile}
         | \usepackage{lipsum}            \begin{document}\expandafter\att
         | achfile\expandafter{\jobname.tex}       \lipsum[1-150]
         | \end{document}
        
           | leephillips wrote:
           | Using xelatex, I got only the text, no pushbutton. Using
           | pdflatex, I got a pushbutton, but it was not a hyperlink,
           | just an image. What engine do you use to get this to work?
        
             | jimhefferon wrote:
             | I ran pdflatex from a 2017 TeX Live install under Ubuntu,
             | and viewed in Acrobat Reader.
        
               | leephillips wrote:
               | Ahh, I think it is a viewer problem. Sadly, most viewers
               | can not handle PDF attachments properly or at all.
        
       | clearing wrote:
       | I honestly can't believe all the praise for HTML and web on HN in
       | the face of this awesome critique. I hugely appreciate the love
       | for actual files.
       | 
       | >* PDFs are decentralised. You may have obtained this PDF from a
       | website, or maybe not! Self-contained static files are
       | liberating! They stand alone and are not dependent on being
       | hosted by any particular web server or under any particular
       | domain. You can publish to one host or to a thousand hosts or to
       | none, and it maintains its identity through content-addressing,
       | not through the blessing of a distributor.
       | 
       | This seems to have gotten lost in the offense everyone has taken
       | over the choice to not use 'simple HTML', despite the document's
       | clear reasoning that to do even that would embed the content deep
       | in the 'urban web'. All of these simple-complex propositions
       | about making some subset language or automating document flows
       | are missing the point entirely.
        
         | danShumway wrote:
         | > You can publish to one host or to a thousand hosts or to
         | none, and it maintains its identity through content-addressing,
         | not through the blessing of a distributor.
         | 
         | It kind of seems like you're describing IPFS, except with worse
         | content addressing guarantees. The vast majority of your users
         | will never check to see if a PDF's content actually match its
         | content address.
         | 
         | > All of these simple-complex propositions about making some
         | subset language or automating document flows are missing the
         | point entirely.
         | 
         | Are they? It's really not that hard to build a self-contained
         | HTML file, and to re-emphasize, signed PDFs and signed HTML
         | files are about the same level of accessibility to most users.
         | Web browsers don't really handle either, if you want those
         | guarantees you need to use a protocol/technology with better
         | support right from the start.
         | 
         | Also to be clear, despite the author's argument that PDFs _can_
         | be self-contained, no browser guarantees that, and there 's no
         | way for me to tell if the PDF is self contained when I click on
         | it in Firefox unless I download it and check it myself offline
         | or in a viewer that guarantees it won't make network requests.
         | 
         | Nothing online that I'm aware of forces authors to use PDF/A,
         | so when I download a PDF, I _don 't_ know what I'm getting.
         | It's not actually the magical, re-hostable world that the
         | author claims.
         | 
         | I'm not sure that people are missing the author's point so much
         | as they're saying the author is making claims about the
         | portability of PDFs that aren't necessarily accurate. Yes, it
         | would be good to have better self-contained guarantees about
         | some web-content, but I'm not sure PDFs actually supply any of
         | those guarantees.
        
       | chrismorgan wrote:
       | One previous discussion in comments:
       | https://news.ycombinator.com/item?id=24257982
       | 
       | For my part, I expressed bafflement because the end result seems
       | worse than the starting point in almost every way, _including_
       | those that the author was complaining about the web for.
       | 
       | (There are a couple of others to be found in
       | https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...,
       | but not so substantial.)
        
       | [deleted]
        
       | danShumway wrote:
       | I guess by modern standards this load time is acceptable, but
       | when you argue that PDFs are a way to move forward, you're
       | competing with HTML 4/5. And by that standard:
       | 
       | - Crud this website is so _slow_. Unacceptably slow. If your
       | technology stack is spending 10 seconds just to fetch and render
       | 13 pages of large-screen text, then either you 're doing
       | something wrong or it's a bad technology stack. That load time
       | alone should kill this idea.
       | 
       | - There's no way for me to turn off images. This is the opposite
       | of a client-respecting webpage, the only way you could make it
       | worse is by rendering to Canvas or shipping me a PNG. My mobile
       | browser doesn't fetch fonts by default. You're overriding my
       | choice to do that.
       | 
       | - Mobile? Reflow? Responsive design? Adjustable font sizes? The
       | author kind of offhandedly says that PDFs can do reflow right
       | now, but how many clients actually support that. Does the PDF
       | format handle this by default?
       | 
       | - Saying "you can technically make PDF accessible" is exactly the
       | same as saying "you can technically use just a subset of HTML."
       | It's the same argument. Nobody does it, PDFs are generally
       | hostile to accessibility, and there's no way to signal that a PDF
       | is accessible or enforce it as a community standard.
       | 
       | So, the much bigger question: what's wrong with Gemini[0]? I've
       | been critical of Gemini in the past on multiple fronts, but if
       | you are in this space where you want to burn everything down and
       | make your blog static, Gemini really does seem to solve every
       | problem that the author has, except better. It's also trivial to
       | proxy Gemini documents or statically re-render them to HTML,
       | which makes them accessible to people outside the community. And
       | by default, they're both pretty accessible to screen readers, and
       | much more efficient than what the author is proposing.
       | 
       | The author argues that using static HTML wouldn't be good enough
       | because there's no standard that forces you to exclude
       | Javascript. Then they point to PDF/A, which is not a standard
       | that is enforced by most browser PDF viewers. To me, this
       | argument isn't any different from telling website authors to
       | choose not to use Javascript, what is going to force anyone to
       | use PDF/A? Every web browser PDF reader supports Javascript.
       | NoScript support in Firefox is better than the
       | controls/extensions for disabling PDF scripting.
       | 
       | And Gemini is _right there_ : for the most part it's actually
       | working today. So I just don't get it. Why pick a technology
       | that's tangibly worse than the web on (and I mean this quite
       | literally) almost every single axis and every single metric, when
       | you could instead switch to a markup language that actually does
       | have use-cases, that does simplify deployment and blogging in
       | some instances, that does have a real community, that does have
       | some real advantages over HTML, that does have some real momentum
       | behind it, and that doesn't disrespect my choices about what
       | fonts/images I want to download?
       | 
       | [0]: https://gemini.circumlunar.space/
        
       | lucian1900 wrote:
       | Sounds to me like ePub would fit better. It's designed for reflow
       | and it's built out of a subset of HTML. Worse case the contents
       | of the file can be expanded.
        
       | leephillips wrote:
       | There are good points here, but I think the author slightly
       | undermines his message because the layout and typography of this
       | particular PDF is so poor. Probably because it "was written in
       | the world's greatest web authoring tool: LibreOffice Writer".
       | 
       | In other words, one advantage of PDF is that free authoring tools
       | such as the TeX family can create typographically beautiful
       | results that are nearly impossible to achieve with HTML, but he
       | leaves that on the table.
        
       | petercooper wrote:
       | In a sea of cynicism, I gotta say.. bravo. This genuinely put a
       | smile on my face. It has a lot of problems, sure, but it's a
       | creative use of the Web and would surely work for _some_ use
       | cases. It 's certainly no worse than using Flash ever was.
       | 
       | It reminds me a bit of a "newsletter" I'm subscribed to called,
       | ironically, "Not a Newsletter" (http://notanewsletter.com/). You
       | get an email from the author each month and it just points to a
       | Google Doc where he puts the actual content. Why's this good? The
       | content can't set off any spam filters, he can edit the issue
       | after it's "sent" if there are mistakes or broken links..
        
         | sneak wrote:
         | The content can be censored arbitrarily by google, and when you
         | click on mobile web with the docs app installed, it logs your
         | logged in google account identity (maybe for work?) with the
         | view when it switches to the app.
         | 
         | Files have none of these problems.
        
           | indigochill wrote:
           | If the author was concerned about getting censored by Google
           | or feeding their data empire, they could set up a self-hosted
           | Google Docs-like, like NextCloud.
           | 
           | The readers would still need to trust the author's not doing
           | anything nefarious with their IP addresses, but I guess
           | there's a degree of implicit trust when subscribing to a
           | newsletter.
        
             | noduerme wrote:
             | I would just put it on my own server. Are people really
             | worried about clicking a private link and having their IP
             | address logged? Just opening an email with a tracking pixel
             | triggers that already, and you have to assume clicking a
             | link will log your IP whether with Google or Constant
             | Contact or any other mass email provider.
        
           | nonameiguess wrote:
           | Google Docs are still files. It's just up to the author (or
           | even the readers) to keep copies outside of Google's servers.
           | Unless Lab6 owns their own servers, whoever is hosting these
           | pdfs can delete them as well. At least, in both cases, static
           | files are much easier to backup and copy than entire three-
           | tier dynamic applications. And readers can keep their own
           | copies separate from the original, which isn't possible with
           | an application at all.
        
             | lmm wrote:
             | > Google Docs are still files. It's just up to the author
             | (or even the readers) to keep copies outside of Google's
             | servers.
             | 
             | No they're not? You literally can't have a google doc as a
             | file in a first-class way - you can export it to a file,
             | but that's a lossy process.
        
               | noduerme wrote:
               | Yup. Another way to say it is Google will release a file
               | format the day offline computing drops dead. It should
               | probably amount to an antitrust case or at least a major
               | class action claim at this point. That said, even _with_
               | PDF specs it 's freakin impossible to read/write that
               | format in an intelligible way, if the person creating the
               | document used even the barest amount of block alignment.
               | Adobe started with an innovative notion about layout, but
               | ended up making content extremely hard to parse, and
               | actually tried to open source the engine. Google started
               | with an idea of trapping everyone's data in a format
               | they'd never make fully available, and then charging for
               | the privilege of storing it.
        
           | petercooper wrote:
           | You're not wrong! It always a trade off of one set of
           | problems for another with these sorts of things, I guess.
        
       | midrus wrote:
       | LOL
        
       | ok123456 wrote:
       | Jekyll plugin that produces a pdf version of each page?
        
       | mattnewton wrote:
       | I can't tell if this is satire or not, because reading it on my
       | phone hurt my eyes after the first couple pages.
       | 
       | Please use EPub if you are after an open format or freeze web
       | pages into an offline-able format and don't use PDF.
        
       | ccorcos wrote:
       | "Files are a basic human freedom" - that definitely resonates
       | with me.
       | 
       | There's an assortment of trade-offs though. In particular,
       | linking between files breaks if you ever want to move or rename a
       | file. Also, by self-encapsulating every file, you end up using
       | space less efficiently.
        
       ___________________________________________________________________
       (page generated 2021-07-19 23:01 UTC)