[HN Gopher] Implementing a "mini-LaTeX" in ~2000 lines of code
___________________________________________________________________
Implementing a "mini-LaTeX" in ~2000 lines of code
Author : ingve
Score : 46 points
Date : 2022-08-04 18:08 UTC (4 hours ago)
(HTM) web link (nibblestew.blogspot.com)
(TXT) w3m dump (nibblestew.blogspot.com)
| ajross wrote:
| Lots of nitpickery about what exactly it means to be TeX here,
| that IMHO are sort of forest-for-the-trees argumentation.
|
| What's interesting here is that it's a blog post on how to do
| page-level typesetting, which was a subject of great interest and
| serious research in the 1970's but in the modern world has been
| mostly forgotten. And that's somewhat unique. Modern developers
| still occasionally worry about low level details like instruction
| architectures or I/O latency or packet dumps. The lessons of our
| ancestors about algorithm choice are still alive in our culture.
|
| But virtually no one gives any thought to how their text is laid
| out at a level of sophistication higher than the CSS box model.
|
| And that's kinda nice to see.
| chongli wrote:
| This is fast becoming a pet peeve of mine: people conflating
| LaTeX with TeX. TeX is the macro language and the compiler, LaTeX
| is the library.
|
| Replacing TeX is pretty easy since it is a fairly small language.
| Replacing LaTeX, on the other hand, is very difficult because it
| contains a rather huge number of packages that have been
| developed over the decades.
| leephillips wrote:
| Not quite. TeX is the compiler that produces DVI or PDF files
| and the typesetting language. It is also a set of macros that
| together are _plain TeX_. It is also all of this together, a
| typesetting system. The name is somewhat overloaded.
|
| LaTeX is not a library. It's what is called a _format_ in the
| TeX world; essentially another set of macros that create an
| alternative to plain TeX. Yes, there are a huge number of
| packages that are designed to work with LaTeX, but they are not
| _part_ of LaTeX. Most distributions, such as TeXLive, include
| TeX, LaTeX, a large number of these packages, fonts, and much
| else.
| choeger wrote:
| Important distinction: TeX is an interpreter, not a compiler.
| code_biologist wrote:
| Why is that an important distinction? It's not clear to me
| that it is.
| Koshkin wrote:
| Doesn't TeX emit a file like a compiler? You could say that a
| PDF reader, say, is an "interpreter," but TeX itself? I don't
| think so.
| bonzini wrote:
| A compiler generates a file that can be executed to produce
| the same computation performed by the original source.
| Since loops and conditionals(*) aren't apparent in TeX's
| output, but are performed entirely while TeX runs, it is an
| interpreter.
|
| Regarding the output, TeX is an interpreter that produces
| its output as a DVI or PDF file rather than as text on a
| console or showing it in a GUI.
|
| (*) Both those in the source, and those implicit in TeX's
| typesetting algorithm.
| SkeuomorphicBee wrote:
| You mentioned loops in the output as a requirement for a
| compiler, but I'm this case all compilers that unroll
| loops are not compilers. Early shader compilers come to
| mind.
| ketralnis wrote:
| I don't think that's either important or correct.
|
| The biggest distinction I'd make between an interpreter and
| compiler is whether you need it to be present at execution
| time+, and if you use TeX to output PDF then obviously you
| don't need TeX to be present to view the PDF.
|
| Correctness aside here's what you're replying to:
|
| > TeX is the macro language and the compiler, LaTeX is the
| library.
|
| What exactly here makes any compiler/interpreter distinction
| important? How is the communication improved by this
| "important" "distinction"?
|
| +: Yes yes there are so many nitpicky "corrections" to make
| here that, indeed, any "distinction" can approach
| meaninglessness. If you're about to reply with that, read the
| second bit.
| PaulDavisThe1st wrote:
| Some old sage: when you're relatively to new to computers,
| you don't understand the difference between an interpreter
| and a compiler. After a while, you see the difference, and
| it's substantial and important. After a while longer, you
| don't see the difference anymore, though for different
| reasons than at first.
| thrown_22 wrote:
| I once build an interpreter for x86_64 to see what my
| machine code was doing line by line. Then I read the GDB
| manual.
| lapinot wrote:
| To make things a bit more clear (hope that's what you
| meant!):
|
| One concrete angle is that interpreters and compilers are
| usually intertwined: interpreters commonly have an abstract
| machine which is some language a bit simpler to execute
| than the full language, thus they have a compilation step
| at first. And compilers do stuff like constant propagation,
| inlining and other flavours of partial evaluation. Thus
| they contain some sorts of interpreters.
|
| Another more abstract angle is to consider interpreters as
| compilers into a "trivial" language consisting only of
| values and side effects or the flip side: considering
| compilers as interpreters with a non-standard semantic.
| PaulHoule wrote:
| nroff is probably a better comparison than LaTeX for this
|
| https://en.wikipedia.org/wiki/Nroff
| AlanYx wrote:
| He's doing something inspired by Knuth-Plass linebreaking, so
| either Heirloom Troff or Neatroff would be the better
| comparison.
| mhd wrote:
| It's interesting to look at the early issues of the official TeX
| publication Tugboat[1]. The first version of TeX, TeX78, written
| in the SAIL language was just out and there was a concerted
| effort to produce a more portable version in Pascal, TeX82 (IIRC
| already known under that name during production).
|
| The first issues of Tugboat then contained several reports about
| early TeX being ported to various architectures. That often meant
| rewriting the whole program, as basically no other system besides
| Knuth's "native" time-sharing OS had SAIL. So there was a port to
| a (Z80-based) Unix, someone rewriting parts in Fortran, etc. If
| they were tied to a specific backend/printer, there wasn't even a
| need to port Metafont.
|
| TeX was relatively small. That often gets lost in contemporary
| huge LaTeX distributions.
|
| [1]: https://www.tug.org/tugboat/
| seanwilson wrote:
| > As you can easily tell, line breaks made at the beginning of
| the chapter affect the potential line breaks you can do later.
| Sometimes it is worth it to make a locally non-optimal choice at
| the beginning to get a better line break possibility much later.
| Evaluating a global metric like this can be potentially slow,
| which is why interactive programs like LibreOffice do not use
| this method.
|
| Is there not a fast "good enough" algorithm for justifying text
| vs a slow optimal one? Is it really infeasible for HTML to have
| justified text that looks something like LaTeX? It's sad that
| justified text still looks bad on the web with no signs of it
| getting better even with all the computing power we have now.
| chucky wrote:
| I once discussed this with a coworker who had worked a lot on
| various things related to typesetting, and he claimed the main
| issue with doing good text justification for the web is the
| lack of good, free hyphenation dictionaries (or rules) for most
| languages. In order to do full paragraph justification in a way
| that makes sense and always looks good you need to be able to
| hyphenate words dynamically (and fairly aggressively).
|
| LaTeX does come with hyphenation libraries for some languages,
| but web browsers would need wider support. There's also the
| question of how to standardize this, because you would have to
| ship hyphenation dictionaries for every language, and you would
| need to standardize this across browsers so they render pages
| the same.
|
| LaTeX does ship with hyphenation libraries for a bunch of
| languages, but if you try to use it for any of the minor
| languages, you'll find the results are so-so and you need to
| manually hyphenate words to get decent results (this was my
| experience a decade ago at least).
|
| All of the above could be solved, but sadly I suspect it
| wouldn't be worth the cost for the players that would have to
| be involved in solving it.
| bombcar wrote:
| The biggest advantage LaTeX has is that the page width stays
| the same between various versions of your document, so you
| can improve it on the fly when you notice issues.
|
| The web tries to support every possible page width so some
| won't look great.
| tannhaeuser wrote:
| According to previous discussions [1], none that'd fit the
| constraints of text rendering in browsers AIU (eg dynamic
| programming approaches considered infeasible on the web,
| hyphenation not really a thing). But who knows, CSS is
| incredibly bloated already ...
|
| [1]: https://news.ycombinator.com/item?id=28537923
| tomsmeding wrote:
| > > [...] Evaluating a global metric like this can be
| potentially slow, which is why interactive programs like
| LibreOffice do not use this method.
|
| I suspect that another reason for a WYSIWYG office suite to not
| use such global optimisation, is to avoid unpredictable changes
| to layout in response to seemingly unrelated changes by the
| user. If you get full reflows with figures moving around and
| headings flipping between the previous and the next page while
| you're typing a sentence, users will probably quickly punch a
| hole in their monitor.
|
| In (La)TeX this is less of an issue because of the delayed
| compilation, I guess. Also *TeX users are probably more
| tolerant to that kind of mess.
| drfuchs wrote:
| On my 2015 Mac, using just a single core, TeX composes the
| entire 495-page TeXbook in under 0.2 seconds, or about 0.4
| milliseconds per page. That includes reading the 1.3Mb input
| file, evaluating all macros and such, doing all line-breaking
| and page-breaking, and writing the entire 1.9Mb dvi file.
|
| So, I don't know where the "too slow" meme is coming from. Is
| there a browser that can open and scroll to the end of an HTML
| page that's 500 screen-fulls long in 0.2 seconds?
| marcodiego wrote:
| Posted 3 days ago: https://news.ycombinator.com/item?id=32307185
| groby_b wrote:
| "The output contains some typographical niceties, so we can
| reasonably say that the code does contain an implementation of a
| very limited and basic form of LaTeX" has to be on the more
| optimistic side of overstatements.
|
| It's roughly akin to "cat can actually display content quite
| similar to a <pre> block, so it contains a very limited and basic
| form of a web browser".
|
| It's not entirely wrong. But mostly because it left "wrong"
| behind long ago and is in a different country now.
| _gabe_ wrote:
| I thought one of the reasons Knuth made TeX was because there
| were no good math typesetters available. I have no idea if this
| assumption is correct, but one of the main reasons I've resorted
| to using LaTeX in the past is for the math typesetting. I haven't
| found any good alternatives, and it seems incredibly complex
| compared to basic kerning + line breaks and/or justification for
| regular English-like text layout.
|
| So if this implementation lacks math typesetting, and that was
| one of the motivating factors behind the original TeX
| implementations, then I think this code is missing a pretty
| complex core feature.
| cpach wrote:
| That was indeed one of Knuth's primary goals. But perhaps it
| wasn't one of Pakkanen's goals though.
| _gabe_ wrote:
| From the authors original post, it looks like you're right in
| saying that was not one of his goals.
|
| This article says:
|
| > Thus we can reasonably say that the code does contain an
| implementation of a very limited and basic form of LaTeX.
|
| Which is what I'm primarily debating. I feel like a basic
| LaTeX implementation would at least attempt mathematical
| typesetting, but that could be debated :)
| mhd wrote:
| I think TeX is just used eponymously for typesetters, as
| runoff, scribe, lout etc. aren't as popular anymore, to put it
| politely.
___________________________________________________________________
(page generated 2022-08-04 23:00 UTC)