[HN Gopher] Converting my PhD thesis into HTML (2021)
       ___________________________________________________________________
        
       Converting my PhD thesis into HTML (2021)
        
       Author : distcs
       Score  : 81 points
       Date   : 2022-12-19 11:15 UTC (11 hours ago)
        
 (HTM) web link (desfontain.es)
 (TXT) w3m dump (desfontain.es)
        
       | DominikPeters wrote:
       | I would recommend using the lwarp package for turning large latex
       | documents into HTML. Pretty much all other converters attempt to
       | parse the tex files, which is an almost hopeless task. Lwarp has
       | a different strategy: it redefines all macros to produce HTML
       | (e.g. \textbf{example} writes "<strong>example</strong>" into the
       | output pdf) within latex, thereby producing a PDF containing HTML
       | code. It then uses a pdf2txt extractor to get the finished HTML
       | file. Thus, it uses latex to parse the latex.
       | 
       | Lwarp worked for me to produce an HTML version of the TikZ
       | documentation (https://tikz.dev), and that's probably one of the
       | more complicated tex documents that exists. (Though granted, this
       | was still a major effort.)
        
         | gdprrrr wrote:
         | Yeah, it's well known that only LaTeX can parse LaTeX because
         | you can redefine all syntax (catcodes) in the middle of the
         | document.
        
       | the-printer wrote:
       | If he expects people (me) to read or read about his online PhD
       | thesis then I think he should've chosen a font with a larger
       | x-height. Reading the type feels like peeking through a dense
       | bush during a hail storm.
        
         | bspammer wrote:
         | The fact that it's published as HTML means that you can choose
         | your own font if you so desire. A PDF wouldn't let you do that.
        
       | vouaobrasil wrote:
       | I disagree with the author that PDFs are a terrible format. They
       | guarantee layout, which is very important for complex scientific
       | presentations. Even slight differences in layout can make a
       | complex set of equations difficult to parse. LaTeX also has a
       | much superior word-break/hyphening algorithm to the HTML engines
       | of browsers.
       | 
       | I find PDF math papers easy to browse, unlike the author. They're
       | much easier and more organized than a website, can be easily
       | searched and have a *proper table of contents* compared to
       | websites. As for poorly browsable on a phone -- well I think that
       | is irrelevant because nobody is going to read a complex technical
       | paper in practise on a phone. They do look decent in tablets, and
       | as for screen readers...well that's a valid point but screen
       | readers don't work well for material with lots of equations
       | anyway.
       | 
       | I applaud the author for the effort but looking at the result, I
       | would not want to read math that way.
        
         | jmhammond wrote:
         | > and as for screen readers...well that's a valid point but
         | screen readers don't work well for material with lots of
         | equations anyway.
         | 
         | This is something that we'd like to change. There are many
         | visually impaired students who need to learn mathematics the
         | same as you and I.
         | 
         | My "eyes were opened" when I was working with a blind student
         | in my class. The textbook I'd written in pretext (transpiled to
         | pdf and HTML) could be read on his BrailleNote but some of the
         | equations were wonky, so I rewrote them to work for everyone.
         | 
         | It would be better if we developed tools to make them work for
         | everyone straight away, instead of relying on authors. That's
         | one of my career goals.
        
           | felixfbecker wrote:
           | I applaud you for this.
           | 
           | I think MathML (which has gotten much better in browsers,
           | thanks to Igalia[1]) is a much better bet we have to make
           | this possible than LaTex compiling to PDF.
           | 
           | [1] https://mathml.igalia.com/
        
         | CJefferson wrote:
         | Screen readers work perfectly fine with mathml. At worst one
         | can just get the screen reader to read the latex for maths and
         | browse the rest in nice HTML.
         | 
         | On the other hand, PDFs generated from Latex are completely
         | useless for screen readers.
        
         | TacticalCoder wrote:
         | > LaTeX also has a much superior word-break/hyphening algorithm
         | to the HTML engines of browsers.
         | 
         | And because the PDF has a fixed layout it's also much easier to
         | prevent "rivers" in paragraphs. Which hence makes it a no-
         | brainer to use justification. To me many print publication
         | using justified text (including LaTeX documents) are a thing of
         | beauty and I do hate how "left align" breaks the flow of
         | reading. I'm taking slightly different spacing between words
         | due to justification every day over horizontal lines of
         | different length, which I find fugly _and_ confusing beyond
         | repair.
         | 
         | More hyphenation controls are coming to CSS and, one can dream,
         | it may be possible one day to programatically detect rivers?
         | 
         | Meanwhile rivers be damned, I override anyway many sites and
         | add "text-align: justify". The nice thing is: because "text-
         | align: left" is the default many sites and minifiers do not
         | bother with text-align at all, so adding _" text-align:
         | justify"_ works for many, many, many sites.
         | 
         | And I only half-buy anyway the justifications (ah!) for left
         | alignment on the Web.
         | 
         | It's basically saying: _" We know better than people who've
         | been working in print since decades (or more), left align is
         | easier to read"_. I don't buy it. Left align breaks _my_
         | reading flow. And I cannot be the only one.
         | 
         | To me left align is trading potentially ugly looking paragraphs
         | (due to rivers) for certainly ugly looking paragraphs (due to
         | left justification: just look at the right of each paragraph...
         | Such lack of clarity, such chaos cannot be unseen. It's pure
         | fail).
         | 
         | P.S: I've actually typeset books both in LaTeX and QuarkXPress
         | and their were justified, not left-aligned.
        
           | extra88 wrote:
           | > I override anyway many sites and add "text-align: justify".
           | 
           | I think you're an outlier in your strong preference for
           | justified text but this serves as an example in favor of
           | using HTML to present content. Well made web content is much
           | more malleable by users to make it meet their needs and
           | preferences.
        
         | gnull wrote:
         | If your equations are in MathML, the browsers should be able to
         | screen read them at some point.
         | 
         | > Even slight differences in layout can make a complex set of
         | equations difficult to parse.
         | 
         | Such set of equations should normally be represented by a
         | single block, I can't imagine a reason why layout should change
         | inside that block.
         | 
         | The layout of pdf is unnecessarily rigid. When I'm reading it
         | on my screen, there's no reason the text should be split into
         | A4 pages with very specific margin values. Latex also often
         | moves your figures a few pages ahead because they didn't fit on
         | the specific page. There's absolutely no reason for that when
         | you have access to the big continuous canvas of an html page.
         | This works for equations too; if you have a long equation block
         | that happens to be right between two pages, you either have to
         | let one page have a gap, or reorder/rewrite your paragraphs to
         | make the equations fit. None of this has a good excuse when
         | it's read on a screen.
         | 
         | I don't think we need a website, but a js-free webpage with
         | hyperlinks would be a lot better than pdf. Pdfs I find
         | imperfect but ok.
        
           | periheli0n wrote:
           | > I don't think we need a website, but a js-free webpage with
           | hyperlinks
           | 
           | Wasn't this precisely the use case for HTML and the WWW as
           | originally conceived by Berners-Lee and his fellow internet
           | pioneers?
        
             | oplaadpunt wrote:
        
         | dan-robertson wrote:
         | I think you give latex more credit than it deserves. It gives
         | little straightforward control over layout and the only reason
         | documents are manageable is that pages are fixed size and
         | layout changes are mostly local.
         | 
         | It's paragraph breaking was state of the art when it was new
         | but other systems break paragraphs now and potentially better.
         | I also think ragged margins aren't really a problem.
         | 
         | I think if layout mattered as much as you imply, scientists
         | would have to use a tool that offers more control like
         | indesign.
         | 
         | None of this is to say that getting good layout in HTML is
         | easy, of course.
        
           | periheli0n wrote:
           | > I think if layout mattered as much as you imply, scientists
           | would have to use a too that offers more control like
           | indesign.
           | 
           | Yes, precisely that. As a scientist I don't even want to have
           | to deal with layout. That's what publishers are paid
           | extremely well for. When I self-publish content I want the
           | process to be as simple as possible. If this means ragged
           | margins, browser-default styles for headings etc., default
           | colors and fonts -- so be it.
           | 
           | (but to be fair, optimising the layout is an excellent way to
           | procrastinate on doing hard research)
        
         | ta123456789 wrote:
         | PDF papers are also much easier to save/archive and use
         | offline. And great for printing
        
         | jech wrote:
         | > I find PDF math papers easy to browse
         | 
         | So do I. Still, I wish LaTeX produced easily reflowable PDFs,
         | especially when a document is formatted in two columns.
        
           | enriquto wrote:
           | But it does, doesn't it? You add the "twocolumn" option and
           | recompile. Unless your LaTeX is too fancy this will tipically
           | give a very good result (at worst, some figures with
           | hardcoded sizing will be awkardly placed).
        
             | jech wrote:
             | I cannot do that when I'm reading a paper written by
             | somebody else, and I only have the produced PDF.
        
               | abdullahkhalids wrote:
               | That's why arxiv is a god send, because the source is
               | available there, if the author has uploaded it there.
               | 
               | Science needs a culture of open sharing, the same way
               | physics and math has it.
        
             | hgsgm wrote:
        
           | mistrial9 wrote:
           | what you are asking for is called a "round-trip" by some
           | printers.. This was requested the week after PDF was
           | invented! It does work, unless it does not.. the company that
           | invented this technology is apparently infested by MBAs and
           | charismatic nobodies, since they announced they are exiting
           | the type "business" ? Our house of cards is showing.
        
           | baby wrote:
           | Check zotero. It has that feature
        
         | periheli0n wrote:
         | > nobody is going to read a complex technical paper in practise
         | on a phone
         | 
         | I do, in fact. Or rather, I often would like to but with PDF?
         | No chance. IEEE explore online reading sometimes works, but it
         | would work better if they cleaned up their UI to be compatible
         | with phones.
         | 
         | I have read thousands of pages of fiction on a phone and quite
         | enjoyed it. Phones are great for reading if the content reflows
         | properly.
         | 
         | Now publishers and content creators would need to embrace non-
         | paginated, reflowing output. This would not only facilitate
         | reading on phones, but also on tablets and laptop screens.
         | 
         | O'Reilly's online platform does a good job with their app.
         | 
         | There is zero reason why paginated output should be the default
         | in 2022.
        
           | auggierose wrote:
           | O'Reilly doesn't publish math books. All math books in
           | epub/mobi format look like garbage. There isn't a single
           | exception. If you know of one, please tell me. It seems
           | currently too hard to get layout, resolution and inline
           | formulas right in a portable format.
        
             | periheli0n wrote:
             | O'Reilly's online offer has not only O'Reilly books, but
             | ones from other publishers as well. Some of them have
             | equations. However, they are often rendered as images.
             | 
             | IEEE explore does a good job rendering equations on phone
             | screens. Therefore, it is possible.
             | 
             | There is no technical reason why equations couldn't be
             | rendered on a screen just as well as on a PDF. Sure, canvas
             | size constraints might interfere, but this problem exists
             | in principle also on paginated output. Plus, horizontal
             | scrolling is a thing.
             | 
             | I'm not saying a phone is the ideal platform to read a
             | paper containing free energy-like math, but it can go a
             | long way. Much longer than with the artificial restriction
             | to paginated output like PDF.
        
               | auggierose wrote:
               | Of course it is technically possible, but I haven't seen
               | it done properly. I have never seen a book with math
               | rendered as images that was of satisfactory quality or
               | even close to what PDF can offer. I doubt IEEE explore is
               | an exception, but I don't have an account, so cannot
               | check.
               | 
               | I would like to be able to read a book also on a phone,
               | but I am not going to compromise on quality for that,
               | given that I can just read it on a large tablet in PDF
               | format.
        
               | periheli0n wrote:
               | It is possible to find Open Access articles with math on
               | ieeexplore with little effort. Have a look here:
               | https://ieeexplore.ieee.org/document/6767058
               | 
               | Does this live up to your maths standards?
        
             | goosedragons wrote:
             | With MathML epubs can look decent. For example take a look
             | at the sample MathML epub "A First Course In Linear
             | Algebra" [0] (in a reader that supports MathML of course).
             | It looks pretty good. The problem is Amazon STILL doesn't
             | support MathML, so publishers just churn out a gross
             | version where all the equations are images and so then it
             | doesn't scale properly with the text and the book becomes
             | 300+ MB because of it. And they can't be bothered to make
             | two versions for readers like Kobo that do support MathML.
             | 
             | [0]: https://github.com/IDPF/epub3-samples/releases/downloa
             | d/2017...
        
               | abdullahkhalids wrote:
               | I tried the book. There are several places where long
               | equations are cut off. Other minor spacing issues here
               | and there.
        
           | oplaadpunt wrote:
           | Yes, fiction works because the layout is simple, consisting
           | of text, and maybe images?
           | 
           | Research papers are far more complex, and have established
           | standards that aid quick reading and parsing. I absolutely
           | don't want to deal with reflowing equations, reflowing
           | figures, or whatever when publishing papers. Precise margins
           | and column widths.
        
             | periheli0n wrote:
             | Yet, by far the vast majority of content produced today,
             | technical or prose, is read on screens.
             | 
             | Responsive webdesign has been around for quite a while. I
             | don't see a reason, other than lack of effort/investment,
             | why we shouldn't be able to read technical papers on
             | variable-width screens, in a non-paginated form.
             | 
             | Dealing with the technical challenges should not be the
             | task of the author, but the publisher. And indeed, most
             | publishers are on it.
             | 
             | What's missing is a standardised format that can be
             | downloaded, annotated, re-shared like a PDF.
        
               | bccdee wrote:
               | I wish there were a convention for sharing whole
               | websites. Even a zip file containing an index.html plus
               | images, css, other pages, etc. would be fine if browsers
               | just supported it.
        
         | godelski wrote:
         | You can't have animations with PDFs. Anyone using beamer is
         | familiar with this frustration. But animations are incredibly
         | helpful in explaining many works. 3Blue1Brown became so popular
         | in major part due to his use of (fantastic) animations that
         | more easily explain the material than any static image could.
        
         | hgsgm wrote:
        
         | mavhc wrote:
         | Get rid of the 2 column thing and most people would be happy.
         | 
         | What guarantees of layout do you require?
         | 
         | In related news, MathML is back in Chrome v109
        
           | michaelt wrote:
           | _> What guarantees of layout do you require?_
           | 
           | Some people write documents that can only be clearly
           | presented on a 15" or larger display. Maybe a comparison
           | table with a bunch of columns, maybe a detailed chart, maybe
           | a PCB schematic, whatever.
           | 
           | These people, being considerate of their readers, want to
           | ensure if someone with a 13" screen comes along, they'll get
           | scrollbars or small text, rather than a badly reflowed table
           | where the word 'Yes' gets split over 3 lines.
           | 
           | Other people want to read those documents on 5" phone
           | displays.
        
       | bravura wrote:
       | I think the simplest solution is uploading your thesis to
       | arxiv.org, then using arxiv-vanity (based upon LaTeXML) to render
       | your arxiv link as a responsive web page.
        
       | periheli0n wrote:
       | The real shocker is that it's 2022 and LaTeX is still the best
       | writing environment for a PhD thesis. It has so many downsides:
       | the markup syntax is ugly, it really works best only if one used
       | paginated output such as PDF, a zoo of partly incompatible
       | packages, need for compilation, obscure figure placing algorithms
       | that are difficult to control, and so on.
       | 
       | It still beats the competition because of rock-solid referencing,
       | both to in-text elements like equations, chapters, etc as well as
       | citing literature with bibtex.
       | 
       | Plus, it's extremely stable, so someone who learnt LaTeX 20 years
       | ago, like yours truly, can download the newest TeX distribution
       | and feel at home immediately.
       | 
       | Nevertheless, I would prefer a Markdown-based system that can use
       | CSS and MathML, and has a 100% bibtex clone for references.
       | 
       | Yes, pandoc goes quite a long way along this route, but setting
       | up such a pipeline is still too complicated for many.
        
         | nicodjimenez wrote:
         | Mathpix Markdown is an attempt and bringing together the best
         | of words (Markdown and LaTeX) while providing excellent
         | interoperability with LaTeX, meaning you can easily export your
         | Mathpix Markdown documents to LaTeX, including equation
         | references, tabular environments, images, etc:
         | 
         | https://github.com/Mathpix/mathpix-markdown-it
         | 
         | Disclaimer: I'm the founder of Mathpix.
        
         | analog31 wrote:
         | It must depend on the field. A close relative of mine is a PhD
         | advisor in a science field. He's hands-off about it, but is
         | also aware of what his students are doing. If asked, he
         | recommends MS Word, which is also what he uses for his
         | manuscripts.
         | 
         | My own experience was as a physics student, 30 years ago.
         | Students paid a heavy price for being able to print and submit
         | the entire thesis with no manual intervention. The students who
         | chose LaTeX took the longest at it. I didn't have access to a
         | Unix terminal anyway, and banged out my thesis on an MS-DOS
         | machine. Whatever my word processor couldn't support, I added
         | by hand. The readers were OK with this.
         | 
         | My solution to all typographic problems was "take care of it
         | after defense." I spent a few days after my defense getting my
         | copy to be ready for duplication, including sticking all of the
         | page numbers on with glue because I couldn't make inline
         | figures work.
        
           | nextos wrote:
           | LaTeX has, like Org Mode, this mythical aura of being super
           | hard. However, replicating the functionality of Word is
           | trivial and takes an hour or two for a savvy computer user to
           | grasp.
           | 
           | There's always Overleaf, Pandoc or LyX to make things even
           | simpler. LyX in particular deserves to be better known.
           | 
           | Complex things, like TikZ, are of course difficult and time
           | consuming. But those are impossible using Word.
           | 
           | IMHO, the biggest advantages of LaTeX are reproducibility and
           | reference management. Big Word documents are quite fragile.
           | And reference management is a mess.
        
           | periheli0n wrote:
           | Sure, one can write a thesis in MS Word. It has come a long
           | way with support for large documents. But I still find its
           | referencing clumsy, opaque and unstable.
           | 
           | For example, automatic updates of figure numbers in captions
           | and references: Countless times it failed on me and I had to
           | manually recreate the fields, bookmarks, cross-references,
           | and whatnot is needed.
           | 
           | Bibliographies are hardly doable without an external tool
           | that comes with its own headaches.
           | 
           | Typography in MS word is quite decent these days, though.
           | Anyway, the content of a PhD thesis shouldn't be judged by
           | its typography (as long it maintains a readable standard).
        
         | runningmike wrote:
         | I would strongly recommend MyST. MyST extends Markdown for
         | technical and scientific communication. See
         | https://www.myst.tools/
        
           | abdullahkhalids wrote:
           | I tried MyST recently. All I see is a markup language that
           | slowly become more and more complex over time to support more
           | and more features that LaTeX already supports while at the
           | same time acquiring the same syntax complexity of latex.
           | 
           | What people don't acknowledge is that there is a base level
           | of syntax complexity needed to produce fully general
           | documents. If you do, the natural conclusion is that to fix
           | latex, you need a full rewrite of latex with minor changes to
           | fix all the inconsistencies that have crept into it.
        
           | V1ndaar wrote:
           | Going by the documentation it does it by... drumroll...
           | converting to LaTeX!
           | 
           | (edit: generating PDFs that is)
        
             | periheli0n wrote:
             | To be fair, there is no better free tool than LaTeX to
             | typeset PDFs. But it fails at non-paginated, free-flowing
             | content.
        
         | godelski wrote:
         | Honestly, it isn't the writing part that annoys me the most. It
         | is tikz and the fact that I can't make animations in beamer.
         | Just resolving these issues would go a long way for me. Tikz
         | could be fixed simply if there was a GUI that could allow for
         | sliders or moving specific objects. Or at least a better way to
         | make a good grid (tip: draw a grid on your canvas, draw
         | whatever you want, remove grid). Things are so difficult to
         | properly line up, even if we have mathematical representations.
         | It shouldn't be that hard...
        
           | abdullahkhalids wrote:
           | I recently discovered this python interface for tikz
           | https://github.com/allefeld/pytikz
           | 
           | While it does not directly address the issues you point at,
           | it does alleviate some issues.
           | 
           | * The syntax is somewhat easier to parse.
           | 
           | * It is a lot easier to write functions to redraw the same
           | components over and over again.
           | 
           | * Doing math calculations to systemically place objects in
           | relation to each other is a lot easier because python's
           | arithmetic syntax is a lot more intuitive than TeX's.
           | 
           | Of course, this does mean that you have to fire up python to
           | draw figures.
        
         | thangalin wrote:
         | > Nevertheless, I would prefer a Markdown-based system
         | 
         | My free, cross-platform desktop Markdown editor, KeenWrite[1],
         | integrates with the ConTeXt typesetting software[2]. I'm
         | working on a branch to make integration containerized[3]
         | because its installation is painful. KeenWrite limits math to
         | plain TeX[4] so that the output can be rendered using any TeX-
         | based typesetter (ConTeXt, LaTeX, MathJax, ekhTEX, etc.).
         | 
         | Here's a sample document typeset using ConTeXt (skip to page 40
         | for the math):
         | 
         | https://pdfhost.io/v/4FeAGGasj_SepiSolar_Highlevel_Software_...
         | 
         | That document theme is called Solare[8].
         | 
         | > that can use CSS and MathML
         | 
         | Adding CSS mixes presentation logic with content, which is
         | something KeenWrite strives to avoid. Instead, KeenWrite
         | implements Pandoc's annotation syntax to keep presentation
         | logic out of the content. I've written about this extensively
         | in my Typesetting Markdown series[5].
         | 
         | You can produce some pretty amazing documents just with
         | annotations, such as the following that I wrote in Markdown and
         | typeset using ConTeXt:
         | 
         | https://impacts.to/downloads/lowres/impacts.pdf
         | 
         | > has a 100% bibtex clone for references.
         | 
         | Markdown fails at references. At some point, I'd like to
         | implement cross-references in KeenWrite. Except there's at
         | least six competing standards for the syntax, which I've also
         | remarked upon[6], making the choice of syntax difficult[7].
         | 
         | > setting up such a pipeline is still too complicated for many
         | 
         | FWIW, my Typesetting Markdown series, which explains how to set
         | up a typesetting pipeline using Pandoc, is one of the reasons I
         | developed KeenWrite: to replace that entire pipeline (R,
         | Markdown, externalized variable interpolation, math, and
         | typesetting) with a single tool.
         | 
         | [1]: https://github.com/DaveJarvis/keenwrite
         | 
         | [2]: https://wiki.contextgarden.net/Installation
         | 
         | [3]:
         | https://github.com/DaveJarvis/keenwrite/blob/1_typeset_using...
         | 
         | [4]:
         | https://github.com/DaveJarvis/keenwrite/blob/main/docs/scree...
         | 
         | [5]: https://dave.autonoma.ca/blog/2020/04/28/typesetting-
         | markdow...
         | 
         | [6]: https://talk.commonmark.org/t/cross-references-and-
         | citations...
         | 
         | [7]: https://xkcd.com/927/
         | 
         | [8]: https://github.com/DaveJarvis/keenwrite-
         | themes/tree/main/sol...
        
         | chaoxu wrote:
         | Have you tried Quarto? It should tick everything in your box
         | (except MathML, but hey that might work too since Quarto is
         | built on pandoc)
        
           | countrymile wrote:
           | +1 for quarto, i wrote my thesis in rmarkdown which flipped
           | easily between latex and html output, with a bibtex
           | referencing system. It also allowed you to inline latex for
           | more complex outputs. And inlining calculated tables and
           | charts meant i could keep my writing and code together.
           | Quarto is the successor.
        
           | periheli0n wrote:
           | Thanks for the pointer, that looks interesting. Especially
           | because it is open source!
           | 
           | I see it supports Jupyter notebook. Math support in those
           | isn't too bad at all, so it might just work for many cases.
        
       | pclmulqdq wrote:
       | I originally wanted to blog using LaTeX, and convert that to
       | HTML. I ran into all of these options, and also started writing
       | my own LaTeX->HTML flow, but it got too complicated to be a hobby
       | project. It turns out that there are parts of LaTeX that are
       | really hard to generically convert to web constructs.
       | 
       | I settled for markdown with KaTeX for math, although I would like
       | to return to the LaTeX->HTML project at some point soon.
        
       | V1ndaar wrote:
       | Currently finishing up my own PhD thesis. My approach to the same
       | problem is quite different. I write my thesis in Org mode.
       | Exporting to HTML is pretty painless. Been doing the same for
       | years for my notes. PDF export via LaTeX & HTML export. LaTeX and
       | PDFs fail pretty hard when including source code (some literate
       | programming in Org). That was my initial motivation behind also
       | producing HTML.
       | 
       | The final thesis that I will hand in is of course a regular PDF
       | (well, a print based on that). But the HTML version can contain
       | lots more stuff that doesn't fit (and belong) into the actual
       | paper thesis, e.g. code snippets to generate plots etc. (optional
       | export of Org subsections). By publishing the git repository of
       | the thesis, linking all code and data + a bit of work -> full
       | reproducible thesis.
        
         | hoosieree wrote:
         | Heh, I also write papers in org and am currently writing my
         | dissertation in org.
         | 
         | Source code is always a pain to export for PDF, especially when
         | switching from 1 to 2 column layout depending on the
         | publication.
         | 
         | My blog is written in org too, but I post-process to make it
         | fit in with the rest of my static site. At some point maybe
         | I'll get enough free time to swap out my makefile setup for
         | org-publish, but if it ain't broke...
         | 
         | To anyone who'll listen I advocate for org-mode as a better
         | alternative to Jupyter notebooks, Markdown, and LaTeX. It's in
         | some ways the antithesis to "do one thing well". If you try to
         | do N things well while adhering to the unix philosophy you end
         | up learning N different tools. But org-mode is one tool that
         | does N things well, _and_ some of the things you learn doing
         | thing N transfer to thing N+1, so you get economies of scale.
        
           | taink wrote:
           | How do you plot graphs with org? I've been trying to use it
           | for that purpose but I can't wrap my head around how to do it
           | without some tikz incantation I don't really understand. I've
           | seen gnuplot mentioned here and there but the setup seems
           | pretty involved.
           | 
           | I'm looking for a way to plot simple numeric data signals in
           | time series, which are pretty trivial in jupyter notebooks.
        
             | V1ndaar wrote:
             | Well, personally as I write almost all my code in Nim and
             | am the developer of ggplotnim [0], I simply write a source
             | code snippet with some short Nim code, generate a plot and
             | dump the filename into the Org file.
             | 
             | If I had more time and wanted something more convenient and
             | magical, I would probably write a elisp function that takes
             | X Y (Z) columns and generates a plot from those using a
             | simple Nim program in the back that receives the data,
             | generates the plot and returns it somehow. Haven't given
             | this much thought though.
             | 
             | [0]: https://github.com/Vindaar/ggplotnim
        
         | janeway wrote:
         | IMO, the pdf version should contain the exact information that
         | any other format might contain. Which version is the thesis and
         | is everything contained. Anything else is just for your own
         | interest.
         | 
         | But your approach sounds great. Good luck!
        
       | 131hn wrote:
       | The thesis french transcript of the tl;dr is written in
       | verse/alexendrin.
       | 
       | Because, why not
        
         | 082349872349872 wrote:
         | excellent !
        
       | breck wrote:
       | If I was writing a PhD thesis today, I'd use Scroll:
       | https://scroll.pub/
        
         | amelius wrote:
         | The FAQ does not explain what scroll is or does near the top of
         | the document. It seems to produce HTML only.
        
           | breck wrote:
           | Better printable PDF support is coming.
        
         | CJefferson wrote:
         | Please tell us when you are self-promoting your own stuff. I
         | would currently recommend against any PhD student using it for
         | their PhD -- it can't even generate a PDF with required
         | formatting, which is required by most Universities.
        
       | dwheeler wrote:
       | The problem here is specific to LaTeX. I wrote my PhD
       | dissertation using OpenOffice.org (now use LibreOffice), and
       | generating HTML was easy (I posted the HTML).
       | 
       | But the author is right, LaTeX is widely used, translating it to
       | HTML is hard, and there are no incentives to make or improve
       | tools. Even if you don't want HTML, it'd be good for the LaTeX
       | tools to automaticalky generate reflowable PDF for accessibility.
       | There should be a process for funding infrastructure to
       | accelerate science, and this would be a good example.
       | 
       | There's an interesting trick you could try. PDF supports
       | embedding other contents. LibreOffice, for example, can slip its
       | original edited file into a generated PDF, producing perfectly
       | editable PDF. Maybe a variant of this idea could be used, e.g.,
       | store the LaTeX source of HTML in the PDF, so people can "get the
       | PDF" yet still have options. But that's just a side idea, the
       | real issue is funding infrastructure for science.
        
       | aitchnyu wrote:
       | In The Art of Unix Programming (2003) the authors assert simple
       | text formats can be grepped, awked and are easy to compose in a
       | text editor. Hand typing xml is cruel. But now text editors have
       | perfect syntax highligting and squiggles to pinpoint errors,
       | autocomplete, automatic formatting and toolchains to eliminate
       | errors and extract info.
       | 
       | Isnt html or a high level abtraction (like Spectacle) the best
       | tool for the job today?
       | 
       | https://formidable.com/open-source/spectacle/docs/#one-html-...
        
         | pjmlp wrote:
         | Hence why one should use stuff like https://www.oxygenxml.com/
         | 
         | Similar products have been in business since the early 2000's.
        
       | BrandoElFollito wrote:
       | It was very cool for OP to do a writeup of the effort it took to
       | convert a thesis.
       | 
       | I wanted to do the same with mine but I lost the sources (it was
       | 20+ years ago and did not survive some upgrade/technology
       | change). I was mostly concerned with the .eps files that are
       | hardly portable to .png or similar.
       | 
       | This made me think a bit about preservation of ~recent data
       | (1990-2010). a lot falls in the category of "not natively on the
       | web yet" and "stored on stuff that does not work anymore".
        
       | davidpolberger wrote:
       | Back in 2010, I used plasTeX (http://plastex.github.io/plastex/)
       | to convert my thesis to HTML
       | (http://www.polberger.se/components/). plasTeX is "a Python
       | package to convert LaTeX markup to DOM." If memory serves,
       | plasTeX worked rather well, and still seems to be maintained
       | today.
        
       | jltsiren wrote:
       | PDF documents have two main benefits. The entire document is a
       | single file, and we know that old documents work.
       | 
       | I regularly read papers from 10 or 20 years ago and sometimes
       | even ones from 30 years ago. The old PDF documents work without
       | any major issues. I have much less confidence that future
       | browsers will continue displaying old HTML documents laid out
       | using then-obsolete techniques in a sane way on future hardware.
        
         | felixfbecker wrote:
         | What HTML document from 10 or 20 years ago does not still
         | render in a modern browser? Modern browsers are _extremely_
         | good to maintaining backwards compatibility at all costs (aka
         | "don't break the web"), to the degree many on HN often argue it
         | hinders evolving web technologies.
        
           | jltsiren wrote:
           | The ones that relied on Flash, for example. And the layouts
           | designed for 800x600 displays may not work particularly well
           | on modern computers.
        
       | nicodjimenez wrote:
       | Mathpix (https://mathpix.com) provides a drag and drop tool that
       | converts PDF -> Markdown -> (HTML, LaTeX, PDF, DOCX). It's very
       | handy and a lot of researchers and publishers use this tool, as
       | well as people in the accessibility space (we make math content
       | accessible for visually impaired students).
       | 
       | Disclaimer: I'm the founder of Mathpix.
        
       ___________________________________________________________________
       (page generated 2022-12-19 23:01 UTC)