[HN Gopher] Converting my PhD thesis into HTML (2021)
___________________________________________________________________
Converting my PhD thesis into HTML (2021)
Author : distcs
Score : 81 points
Date : 2022-12-19 11:15 UTC (11 hours ago)
(HTM) web link (desfontain.es)
(TXT) w3m dump (desfontain.es)
| DominikPeters wrote:
| I would recommend using the lwarp package for turning large latex
| documents into HTML. Pretty much all other converters attempt to
| parse the tex files, which is an almost hopeless task. Lwarp has
| a different strategy: it redefines all macros to produce HTML
| (e.g. \textbf{example} writes "<strong>example</strong>" into the
| output pdf) within latex, thereby producing a PDF containing HTML
| code. It then uses a pdf2txt extractor to get the finished HTML
| file. Thus, it uses latex to parse the latex.
|
| Lwarp worked for me to produce an HTML version of the TikZ
| documentation (https://tikz.dev), and that's probably one of the
| more complicated tex documents that exists. (Though granted, this
| was still a major effort.)
| gdprrrr wrote:
| Yeah, it's well known that only LaTeX can parse LaTeX because
| you can redefine all syntax (catcodes) in the middle of the
| document.
| the-printer wrote:
| If he expects people (me) to read or read about his online PhD
| thesis then I think he should've chosen a font with a larger
| x-height. Reading the type feels like peeking through a dense
| bush during a hail storm.
| bspammer wrote:
| The fact that it's published as HTML means that you can choose
| your own font if you so desire. A PDF wouldn't let you do that.
| vouaobrasil wrote:
| I disagree with the author that PDFs are a terrible format. They
| guarantee layout, which is very important for complex scientific
| presentations. Even slight differences in layout can make a
| complex set of equations difficult to parse. LaTeX also has a
| much superior word-break/hyphening algorithm to the HTML engines
| of browsers.
|
| I find PDF math papers easy to browse, unlike the author. They're
| much easier and more organized than a website, can be easily
| searched and have a *proper table of contents* compared to
| websites. As for poorly browsable on a phone -- well I think that
| is irrelevant because nobody is going to read a complex technical
| paper in practise on a phone. They do look decent in tablets, and
| as for screen readers...well that's a valid point but screen
| readers don't work well for material with lots of equations
| anyway.
|
| I applaud the author for the effort but looking at the result, I
| would not want to read math that way.
| jmhammond wrote:
| > and as for screen readers...well that's a valid point but
| screen readers don't work well for material with lots of
| equations anyway.
|
| This is something that we'd like to change. There are many
| visually impaired students who need to learn mathematics the
| same as you and I.
|
| My "eyes were opened" when I was working with a blind student
| in my class. The textbook I'd written in pretext (transpiled to
| pdf and HTML) could be read on his BrailleNote but some of the
| equations were wonky, so I rewrote them to work for everyone.
|
| It would be better if we developed tools to make them work for
| everyone straight away, instead of relying on authors. That's
| one of my career goals.
| felixfbecker wrote:
| I applaud you for this.
|
| I think MathML (which has gotten much better in browsers,
| thanks to Igalia[1]) is a much better bet we have to make
| this possible than LaTex compiling to PDF.
|
| [1] https://mathml.igalia.com/
| CJefferson wrote:
| Screen readers work perfectly fine with mathml. At worst one
| can just get the screen reader to read the latex for maths and
| browse the rest in nice HTML.
|
| On the other hand, PDFs generated from Latex are completely
| useless for screen readers.
| TacticalCoder wrote:
| > LaTeX also has a much superior word-break/hyphening algorithm
| to the HTML engines of browsers.
|
| And because the PDF has a fixed layout it's also much easier to
| prevent "rivers" in paragraphs. Which hence makes it a no-
| brainer to use justification. To me many print publication
| using justified text (including LaTeX documents) are a thing of
| beauty and I do hate how "left align" breaks the flow of
| reading. I'm taking slightly different spacing between words
| due to justification every day over horizontal lines of
| different length, which I find fugly _and_ confusing beyond
| repair.
|
| More hyphenation controls are coming to CSS and, one can dream,
| it may be possible one day to programatically detect rivers?
|
| Meanwhile rivers be damned, I override anyway many sites and
| add "text-align: justify". The nice thing is: because "text-
| align: left" is the default many sites and minifiers do not
| bother with text-align at all, so adding _" text-align:
| justify"_ works for many, many, many sites.
|
| And I only half-buy anyway the justifications (ah!) for left
| alignment on the Web.
|
| It's basically saying: _" We know better than people who've
| been working in print since decades (or more), left align is
| easier to read"_. I don't buy it. Left align breaks _my_
| reading flow. And I cannot be the only one.
|
| To me left align is trading potentially ugly looking paragraphs
| (due to rivers) for certainly ugly looking paragraphs (due to
| left justification: just look at the right of each paragraph...
| Such lack of clarity, such chaos cannot be unseen. It's pure
| fail).
|
| P.S: I've actually typeset books both in LaTeX and QuarkXPress
| and their were justified, not left-aligned.
| extra88 wrote:
| > I override anyway many sites and add "text-align: justify".
|
| I think you're an outlier in your strong preference for
| justified text but this serves as an example in favor of
| using HTML to present content. Well made web content is much
| more malleable by users to make it meet their needs and
| preferences.
| gnull wrote:
| If your equations are in MathML, the browsers should be able to
| screen read them at some point.
|
| > Even slight differences in layout can make a complex set of
| equations difficult to parse.
|
| Such set of equations should normally be represented by a
| single block, I can't imagine a reason why layout should change
| inside that block.
|
| The layout of pdf is unnecessarily rigid. When I'm reading it
| on my screen, there's no reason the text should be split into
| A4 pages with very specific margin values. Latex also often
| moves your figures a few pages ahead because they didn't fit on
| the specific page. There's absolutely no reason for that when
| you have access to the big continuous canvas of an html page.
| This works for equations too; if you have a long equation block
| that happens to be right between two pages, you either have to
| let one page have a gap, or reorder/rewrite your paragraphs to
| make the equations fit. None of this has a good excuse when
| it's read on a screen.
|
| I don't think we need a website, but a js-free webpage with
| hyperlinks would be a lot better than pdf. Pdfs I find
| imperfect but ok.
| periheli0n wrote:
| > I don't think we need a website, but a js-free webpage with
| hyperlinks
|
| Wasn't this precisely the use case for HTML and the WWW as
| originally conceived by Berners-Lee and his fellow internet
| pioneers?
| oplaadpunt wrote:
| dan-robertson wrote:
| I think you give latex more credit than it deserves. It gives
| little straightforward control over layout and the only reason
| documents are manageable is that pages are fixed size and
| layout changes are mostly local.
|
| It's paragraph breaking was state of the art when it was new
| but other systems break paragraphs now and potentially better.
| I also think ragged margins aren't really a problem.
|
| I think if layout mattered as much as you imply, scientists
| would have to use a tool that offers more control like
| indesign.
|
| None of this is to say that getting good layout in HTML is
| easy, of course.
| periheli0n wrote:
| > I think if layout mattered as much as you imply, scientists
| would have to use a too that offers more control like
| indesign.
|
| Yes, precisely that. As a scientist I don't even want to have
| to deal with layout. That's what publishers are paid
| extremely well for. When I self-publish content I want the
| process to be as simple as possible. If this means ragged
| margins, browser-default styles for headings etc., default
| colors and fonts -- so be it.
|
| (but to be fair, optimising the layout is an excellent way to
| procrastinate on doing hard research)
| ta123456789 wrote:
| PDF papers are also much easier to save/archive and use
| offline. And great for printing
| jech wrote:
| > I find PDF math papers easy to browse
|
| So do I. Still, I wish LaTeX produced easily reflowable PDFs,
| especially when a document is formatted in two columns.
| enriquto wrote:
| But it does, doesn't it? You add the "twocolumn" option and
| recompile. Unless your LaTeX is too fancy this will tipically
| give a very good result (at worst, some figures with
| hardcoded sizing will be awkardly placed).
| jech wrote:
| I cannot do that when I'm reading a paper written by
| somebody else, and I only have the produced PDF.
| abdullahkhalids wrote:
| That's why arxiv is a god send, because the source is
| available there, if the author has uploaded it there.
|
| Science needs a culture of open sharing, the same way
| physics and math has it.
| hgsgm wrote:
| mistrial9 wrote:
| what you are asking for is called a "round-trip" by some
| printers.. This was requested the week after PDF was
| invented! It does work, unless it does not.. the company that
| invented this technology is apparently infested by MBAs and
| charismatic nobodies, since they announced they are exiting
| the type "business" ? Our house of cards is showing.
| baby wrote:
| Check zotero. It has that feature
| periheli0n wrote:
| > nobody is going to read a complex technical paper in practise
| on a phone
|
| I do, in fact. Or rather, I often would like to but with PDF?
| No chance. IEEE explore online reading sometimes works, but it
| would work better if they cleaned up their UI to be compatible
| with phones.
|
| I have read thousands of pages of fiction on a phone and quite
| enjoyed it. Phones are great for reading if the content reflows
| properly.
|
| Now publishers and content creators would need to embrace non-
| paginated, reflowing output. This would not only facilitate
| reading on phones, but also on tablets and laptop screens.
|
| O'Reilly's online platform does a good job with their app.
|
| There is zero reason why paginated output should be the default
| in 2022.
| auggierose wrote:
| O'Reilly doesn't publish math books. All math books in
| epub/mobi format look like garbage. There isn't a single
| exception. If you know of one, please tell me. It seems
| currently too hard to get layout, resolution and inline
| formulas right in a portable format.
| periheli0n wrote:
| O'Reilly's online offer has not only O'Reilly books, but
| ones from other publishers as well. Some of them have
| equations. However, they are often rendered as images.
|
| IEEE explore does a good job rendering equations on phone
| screens. Therefore, it is possible.
|
| There is no technical reason why equations couldn't be
| rendered on a screen just as well as on a PDF. Sure, canvas
| size constraints might interfere, but this problem exists
| in principle also on paginated output. Plus, horizontal
| scrolling is a thing.
|
| I'm not saying a phone is the ideal platform to read a
| paper containing free energy-like math, but it can go a
| long way. Much longer than with the artificial restriction
| to paginated output like PDF.
| auggierose wrote:
| Of course it is technically possible, but I haven't seen
| it done properly. I have never seen a book with math
| rendered as images that was of satisfactory quality or
| even close to what PDF can offer. I doubt IEEE explore is
| an exception, but I don't have an account, so cannot
| check.
|
| I would like to be able to read a book also on a phone,
| but I am not going to compromise on quality for that,
| given that I can just read it on a large tablet in PDF
| format.
| periheli0n wrote:
| It is possible to find Open Access articles with math on
| ieeexplore with little effort. Have a look here:
| https://ieeexplore.ieee.org/document/6767058
|
| Does this live up to your maths standards?
| goosedragons wrote:
| With MathML epubs can look decent. For example take a look
| at the sample MathML epub "A First Course In Linear
| Algebra" [0] (in a reader that supports MathML of course).
| It looks pretty good. The problem is Amazon STILL doesn't
| support MathML, so publishers just churn out a gross
| version where all the equations are images and so then it
| doesn't scale properly with the text and the book becomes
| 300+ MB because of it. And they can't be bothered to make
| two versions for readers like Kobo that do support MathML.
|
| [0]: https://github.com/IDPF/epub3-samples/releases/downloa
| d/2017...
| abdullahkhalids wrote:
| I tried the book. There are several places where long
| equations are cut off. Other minor spacing issues here
| and there.
| oplaadpunt wrote:
| Yes, fiction works because the layout is simple, consisting
| of text, and maybe images?
|
| Research papers are far more complex, and have established
| standards that aid quick reading and parsing. I absolutely
| don't want to deal with reflowing equations, reflowing
| figures, or whatever when publishing papers. Precise margins
| and column widths.
| periheli0n wrote:
| Yet, by far the vast majority of content produced today,
| technical or prose, is read on screens.
|
| Responsive webdesign has been around for quite a while. I
| don't see a reason, other than lack of effort/investment,
| why we shouldn't be able to read technical papers on
| variable-width screens, in a non-paginated form.
|
| Dealing with the technical challenges should not be the
| task of the author, but the publisher. And indeed, most
| publishers are on it.
|
| What's missing is a standardised format that can be
| downloaded, annotated, re-shared like a PDF.
| bccdee wrote:
| I wish there were a convention for sharing whole
| websites. Even a zip file containing an index.html plus
| images, css, other pages, etc. would be fine if browsers
| just supported it.
| godelski wrote:
| You can't have animations with PDFs. Anyone using beamer is
| familiar with this frustration. But animations are incredibly
| helpful in explaining many works. 3Blue1Brown became so popular
| in major part due to his use of (fantastic) animations that
| more easily explain the material than any static image could.
| hgsgm wrote:
| mavhc wrote:
| Get rid of the 2 column thing and most people would be happy.
|
| What guarantees of layout do you require?
|
| In related news, MathML is back in Chrome v109
| michaelt wrote:
| _> What guarantees of layout do you require?_
|
| Some people write documents that can only be clearly
| presented on a 15" or larger display. Maybe a comparison
| table with a bunch of columns, maybe a detailed chart, maybe
| a PCB schematic, whatever.
|
| These people, being considerate of their readers, want to
| ensure if someone with a 13" screen comes along, they'll get
| scrollbars or small text, rather than a badly reflowed table
| where the word 'Yes' gets split over 3 lines.
|
| Other people want to read those documents on 5" phone
| displays.
| bravura wrote:
| I think the simplest solution is uploading your thesis to
| arxiv.org, then using arxiv-vanity (based upon LaTeXML) to render
| your arxiv link as a responsive web page.
| periheli0n wrote:
| The real shocker is that it's 2022 and LaTeX is still the best
| writing environment for a PhD thesis. It has so many downsides:
| the markup syntax is ugly, it really works best only if one used
| paginated output such as PDF, a zoo of partly incompatible
| packages, need for compilation, obscure figure placing algorithms
| that are difficult to control, and so on.
|
| It still beats the competition because of rock-solid referencing,
| both to in-text elements like equations, chapters, etc as well as
| citing literature with bibtex.
|
| Plus, it's extremely stable, so someone who learnt LaTeX 20 years
| ago, like yours truly, can download the newest TeX distribution
| and feel at home immediately.
|
| Nevertheless, I would prefer a Markdown-based system that can use
| CSS and MathML, and has a 100% bibtex clone for references.
|
| Yes, pandoc goes quite a long way along this route, but setting
| up such a pipeline is still too complicated for many.
| nicodjimenez wrote:
| Mathpix Markdown is an attempt and bringing together the best
| of words (Markdown and LaTeX) while providing excellent
| interoperability with LaTeX, meaning you can easily export your
| Mathpix Markdown documents to LaTeX, including equation
| references, tabular environments, images, etc:
|
| https://github.com/Mathpix/mathpix-markdown-it
|
| Disclaimer: I'm the founder of Mathpix.
| analog31 wrote:
| It must depend on the field. A close relative of mine is a PhD
| advisor in a science field. He's hands-off about it, but is
| also aware of what his students are doing. If asked, he
| recommends MS Word, which is also what he uses for his
| manuscripts.
|
| My own experience was as a physics student, 30 years ago.
| Students paid a heavy price for being able to print and submit
| the entire thesis with no manual intervention. The students who
| chose LaTeX took the longest at it. I didn't have access to a
| Unix terminal anyway, and banged out my thesis on an MS-DOS
| machine. Whatever my word processor couldn't support, I added
| by hand. The readers were OK with this.
|
| My solution to all typographic problems was "take care of it
| after defense." I spent a few days after my defense getting my
| copy to be ready for duplication, including sticking all of the
| page numbers on with glue because I couldn't make inline
| figures work.
| nextos wrote:
| LaTeX has, like Org Mode, this mythical aura of being super
| hard. However, replicating the functionality of Word is
| trivial and takes an hour or two for a savvy computer user to
| grasp.
|
| There's always Overleaf, Pandoc or LyX to make things even
| simpler. LyX in particular deserves to be better known.
|
| Complex things, like TikZ, are of course difficult and time
| consuming. But those are impossible using Word.
|
| IMHO, the biggest advantages of LaTeX are reproducibility and
| reference management. Big Word documents are quite fragile.
| And reference management is a mess.
| periheli0n wrote:
| Sure, one can write a thesis in MS Word. It has come a long
| way with support for large documents. But I still find its
| referencing clumsy, opaque and unstable.
|
| For example, automatic updates of figure numbers in captions
| and references: Countless times it failed on me and I had to
| manually recreate the fields, bookmarks, cross-references,
| and whatnot is needed.
|
| Bibliographies are hardly doable without an external tool
| that comes with its own headaches.
|
| Typography in MS word is quite decent these days, though.
| Anyway, the content of a PhD thesis shouldn't be judged by
| its typography (as long it maintains a readable standard).
| runningmike wrote:
| I would strongly recommend MyST. MyST extends Markdown for
| technical and scientific communication. See
| https://www.myst.tools/
| abdullahkhalids wrote:
| I tried MyST recently. All I see is a markup language that
| slowly become more and more complex over time to support more
| and more features that LaTeX already supports while at the
| same time acquiring the same syntax complexity of latex.
|
| What people don't acknowledge is that there is a base level
| of syntax complexity needed to produce fully general
| documents. If you do, the natural conclusion is that to fix
| latex, you need a full rewrite of latex with minor changes to
| fix all the inconsistencies that have crept into it.
| V1ndaar wrote:
| Going by the documentation it does it by... drumroll...
| converting to LaTeX!
|
| (edit: generating PDFs that is)
| periheli0n wrote:
| To be fair, there is no better free tool than LaTeX to
| typeset PDFs. But it fails at non-paginated, free-flowing
| content.
| godelski wrote:
| Honestly, it isn't the writing part that annoys me the most. It
| is tikz and the fact that I can't make animations in beamer.
| Just resolving these issues would go a long way for me. Tikz
| could be fixed simply if there was a GUI that could allow for
| sliders or moving specific objects. Or at least a better way to
| make a good grid (tip: draw a grid on your canvas, draw
| whatever you want, remove grid). Things are so difficult to
| properly line up, even if we have mathematical representations.
| It shouldn't be that hard...
| abdullahkhalids wrote:
| I recently discovered this python interface for tikz
| https://github.com/allefeld/pytikz
|
| While it does not directly address the issues you point at,
| it does alleviate some issues.
|
| * The syntax is somewhat easier to parse.
|
| * It is a lot easier to write functions to redraw the same
| components over and over again.
|
| * Doing math calculations to systemically place objects in
| relation to each other is a lot easier because python's
| arithmetic syntax is a lot more intuitive than TeX's.
|
| Of course, this does mean that you have to fire up python to
| draw figures.
| thangalin wrote:
| > Nevertheless, I would prefer a Markdown-based system
|
| My free, cross-platform desktop Markdown editor, KeenWrite[1],
| integrates with the ConTeXt typesetting software[2]. I'm
| working on a branch to make integration containerized[3]
| because its installation is painful. KeenWrite limits math to
| plain TeX[4] so that the output can be rendered using any TeX-
| based typesetter (ConTeXt, LaTeX, MathJax, ekhTEX, etc.).
|
| Here's a sample document typeset using ConTeXt (skip to page 40
| for the math):
|
| https://pdfhost.io/v/4FeAGGasj_SepiSolar_Highlevel_Software_...
|
| That document theme is called Solare[8].
|
| > that can use CSS and MathML
|
| Adding CSS mixes presentation logic with content, which is
| something KeenWrite strives to avoid. Instead, KeenWrite
| implements Pandoc's annotation syntax to keep presentation
| logic out of the content. I've written about this extensively
| in my Typesetting Markdown series[5].
|
| You can produce some pretty amazing documents just with
| annotations, such as the following that I wrote in Markdown and
| typeset using ConTeXt:
|
| https://impacts.to/downloads/lowres/impacts.pdf
|
| > has a 100% bibtex clone for references.
|
| Markdown fails at references. At some point, I'd like to
| implement cross-references in KeenWrite. Except there's at
| least six competing standards for the syntax, which I've also
| remarked upon[6], making the choice of syntax difficult[7].
|
| > setting up such a pipeline is still too complicated for many
|
| FWIW, my Typesetting Markdown series, which explains how to set
| up a typesetting pipeline using Pandoc, is one of the reasons I
| developed KeenWrite: to replace that entire pipeline (R,
| Markdown, externalized variable interpolation, math, and
| typesetting) with a single tool.
|
| [1]: https://github.com/DaveJarvis/keenwrite
|
| [2]: https://wiki.contextgarden.net/Installation
|
| [3]:
| https://github.com/DaveJarvis/keenwrite/blob/1_typeset_using...
|
| [4]:
| https://github.com/DaveJarvis/keenwrite/blob/main/docs/scree...
|
| [5]: https://dave.autonoma.ca/blog/2020/04/28/typesetting-
| markdow...
|
| [6]: https://talk.commonmark.org/t/cross-references-and-
| citations...
|
| [7]: https://xkcd.com/927/
|
| [8]: https://github.com/DaveJarvis/keenwrite-
| themes/tree/main/sol...
| chaoxu wrote:
| Have you tried Quarto? It should tick everything in your box
| (except MathML, but hey that might work too since Quarto is
| built on pandoc)
| countrymile wrote:
| +1 for quarto, i wrote my thesis in rmarkdown which flipped
| easily between latex and html output, with a bibtex
| referencing system. It also allowed you to inline latex for
| more complex outputs. And inlining calculated tables and
| charts meant i could keep my writing and code together.
| Quarto is the successor.
| periheli0n wrote:
| Thanks for the pointer, that looks interesting. Especially
| because it is open source!
|
| I see it supports Jupyter notebook. Math support in those
| isn't too bad at all, so it might just work for many cases.
| pclmulqdq wrote:
| I originally wanted to blog using LaTeX, and convert that to
| HTML. I ran into all of these options, and also started writing
| my own LaTeX->HTML flow, but it got too complicated to be a hobby
| project. It turns out that there are parts of LaTeX that are
| really hard to generically convert to web constructs.
|
| I settled for markdown with KaTeX for math, although I would like
| to return to the LaTeX->HTML project at some point soon.
| V1ndaar wrote:
| Currently finishing up my own PhD thesis. My approach to the same
| problem is quite different. I write my thesis in Org mode.
| Exporting to HTML is pretty painless. Been doing the same for
| years for my notes. PDF export via LaTeX & HTML export. LaTeX and
| PDFs fail pretty hard when including source code (some literate
| programming in Org). That was my initial motivation behind also
| producing HTML.
|
| The final thesis that I will hand in is of course a regular PDF
| (well, a print based on that). But the HTML version can contain
| lots more stuff that doesn't fit (and belong) into the actual
| paper thesis, e.g. code snippets to generate plots etc. (optional
| export of Org subsections). By publishing the git repository of
| the thesis, linking all code and data + a bit of work -> full
| reproducible thesis.
| hoosieree wrote:
| Heh, I also write papers in org and am currently writing my
| dissertation in org.
|
| Source code is always a pain to export for PDF, especially when
| switching from 1 to 2 column layout depending on the
| publication.
|
| My blog is written in org too, but I post-process to make it
| fit in with the rest of my static site. At some point maybe
| I'll get enough free time to swap out my makefile setup for
| org-publish, but if it ain't broke...
|
| To anyone who'll listen I advocate for org-mode as a better
| alternative to Jupyter notebooks, Markdown, and LaTeX. It's in
| some ways the antithesis to "do one thing well". If you try to
| do N things well while adhering to the unix philosophy you end
| up learning N different tools. But org-mode is one tool that
| does N things well, _and_ some of the things you learn doing
| thing N transfer to thing N+1, so you get economies of scale.
| taink wrote:
| How do you plot graphs with org? I've been trying to use it
| for that purpose but I can't wrap my head around how to do it
| without some tikz incantation I don't really understand. I've
| seen gnuplot mentioned here and there but the setup seems
| pretty involved.
|
| I'm looking for a way to plot simple numeric data signals in
| time series, which are pretty trivial in jupyter notebooks.
| V1ndaar wrote:
| Well, personally as I write almost all my code in Nim and
| am the developer of ggplotnim [0], I simply write a source
| code snippet with some short Nim code, generate a plot and
| dump the filename into the Org file.
|
| If I had more time and wanted something more convenient and
| magical, I would probably write a elisp function that takes
| X Y (Z) columns and generates a plot from those using a
| simple Nim program in the back that receives the data,
| generates the plot and returns it somehow. Haven't given
| this much thought though.
|
| [0]: https://github.com/Vindaar/ggplotnim
| janeway wrote:
| IMO, the pdf version should contain the exact information that
| any other format might contain. Which version is the thesis and
| is everything contained. Anything else is just for your own
| interest.
|
| But your approach sounds great. Good luck!
| 131hn wrote:
| The thesis french transcript of the tl;dr is written in
| verse/alexendrin.
|
| Because, why not
| 082349872349872 wrote:
| excellent !
| breck wrote:
| If I was writing a PhD thesis today, I'd use Scroll:
| https://scroll.pub/
| amelius wrote:
| The FAQ does not explain what scroll is or does near the top of
| the document. It seems to produce HTML only.
| breck wrote:
| Better printable PDF support is coming.
| CJefferson wrote:
| Please tell us when you are self-promoting your own stuff. I
| would currently recommend against any PhD student using it for
| their PhD -- it can't even generate a PDF with required
| formatting, which is required by most Universities.
| dwheeler wrote:
| The problem here is specific to LaTeX. I wrote my PhD
| dissertation using OpenOffice.org (now use LibreOffice), and
| generating HTML was easy (I posted the HTML).
|
| But the author is right, LaTeX is widely used, translating it to
| HTML is hard, and there are no incentives to make or improve
| tools. Even if you don't want HTML, it'd be good for the LaTeX
| tools to automaticalky generate reflowable PDF for accessibility.
| There should be a process for funding infrastructure to
| accelerate science, and this would be a good example.
|
| There's an interesting trick you could try. PDF supports
| embedding other contents. LibreOffice, for example, can slip its
| original edited file into a generated PDF, producing perfectly
| editable PDF. Maybe a variant of this idea could be used, e.g.,
| store the LaTeX source of HTML in the PDF, so people can "get the
| PDF" yet still have options. But that's just a side idea, the
| real issue is funding infrastructure for science.
| aitchnyu wrote:
| In The Art of Unix Programming (2003) the authors assert simple
| text formats can be grepped, awked and are easy to compose in a
| text editor. Hand typing xml is cruel. But now text editors have
| perfect syntax highligting and squiggles to pinpoint errors,
| autocomplete, automatic formatting and toolchains to eliminate
| errors and extract info.
|
| Isnt html or a high level abtraction (like Spectacle) the best
| tool for the job today?
|
| https://formidable.com/open-source/spectacle/docs/#one-html-...
| pjmlp wrote:
| Hence why one should use stuff like https://www.oxygenxml.com/
|
| Similar products have been in business since the early 2000's.
| BrandoElFollito wrote:
| It was very cool for OP to do a writeup of the effort it took to
| convert a thesis.
|
| I wanted to do the same with mine but I lost the sources (it was
| 20+ years ago and did not survive some upgrade/technology
| change). I was mostly concerned with the .eps files that are
| hardly portable to .png or similar.
|
| This made me think a bit about preservation of ~recent data
| (1990-2010). a lot falls in the category of "not natively on the
| web yet" and "stored on stuff that does not work anymore".
| davidpolberger wrote:
| Back in 2010, I used plasTeX (http://plastex.github.io/plastex/)
| to convert my thesis to HTML
| (http://www.polberger.se/components/). plasTeX is "a Python
| package to convert LaTeX markup to DOM." If memory serves,
| plasTeX worked rather well, and still seems to be maintained
| today.
| jltsiren wrote:
| PDF documents have two main benefits. The entire document is a
| single file, and we know that old documents work.
|
| I regularly read papers from 10 or 20 years ago and sometimes
| even ones from 30 years ago. The old PDF documents work without
| any major issues. I have much less confidence that future
| browsers will continue displaying old HTML documents laid out
| using then-obsolete techniques in a sane way on future hardware.
| felixfbecker wrote:
| What HTML document from 10 or 20 years ago does not still
| render in a modern browser? Modern browsers are _extremely_
| good to maintaining backwards compatibility at all costs (aka
| "don't break the web"), to the degree many on HN often argue it
| hinders evolving web technologies.
| jltsiren wrote:
| The ones that relied on Flash, for example. And the layouts
| designed for 800x600 displays may not work particularly well
| on modern computers.
| nicodjimenez wrote:
| Mathpix (https://mathpix.com) provides a drag and drop tool that
| converts PDF -> Markdown -> (HTML, LaTeX, PDF, DOCX). It's very
| handy and a lot of researchers and publishers use this tool, as
| well as people in the accessibility space (we make math content
| accessible for visually impaired students).
|
| Disclaimer: I'm the founder of Mathpix.
___________________________________________________________________
(page generated 2022-12-19 23:01 UTC)