[HN Gopher] Implementing a "mini-LaTeX" in ~2000 lines of code
       ___________________________________________________________________
        
       Implementing a "mini-LaTeX" in ~2000 lines of code
        
       Author : ingve
       Score  : 46 points
       Date   : 2022-08-04 18:08 UTC (4 hours ago)
        
 (HTM) web link (nibblestew.blogspot.com)
 (TXT) w3m dump (nibblestew.blogspot.com)
        
       | ajross wrote:
       | Lots of nitpickery about what exactly it means to be TeX here,
       | that IMHO are sort of forest-for-the-trees argumentation.
       | 
       | What's interesting here is that it's a blog post on how to do
       | page-level typesetting, which was a subject of great interest and
       | serious research in the 1970's but in the modern world has been
       | mostly forgotten. And that's somewhat unique. Modern developers
       | still occasionally worry about low level details like instruction
       | architectures or I/O latency or packet dumps. The lessons of our
       | ancestors about algorithm choice are still alive in our culture.
       | 
       | But virtually no one gives any thought to how their text is laid
       | out at a level of sophistication higher than the CSS box model.
       | 
       | And that's kinda nice to see.
        
       | chongli wrote:
       | This is fast becoming a pet peeve of mine: people conflating
       | LaTeX with TeX. TeX is the macro language and the compiler, LaTeX
       | is the library.
       | 
       | Replacing TeX is pretty easy since it is a fairly small language.
       | Replacing LaTeX, on the other hand, is very difficult because it
       | contains a rather huge number of packages that have been
       | developed over the decades.
        
         | leephillips wrote:
         | Not quite. TeX is the compiler that produces DVI or PDF files
         | and the typesetting language. It is also a set of macros that
         | together are _plain TeX_. It is also all of this together, a
         | typesetting system. The name is somewhat overloaded.
         | 
         | LaTeX is not a library. It's what is called a _format_ in the
         | TeX world; essentially another set of macros that create an
         | alternative to plain TeX. Yes, there are a huge number of
         | packages that are designed to work with LaTeX, but they are not
         | _part_ of LaTeX. Most distributions, such as TeXLive, include
         | TeX, LaTeX, a large number of these packages, fonts, and much
         | else.
        
         | choeger wrote:
         | Important distinction: TeX is an interpreter, not a compiler.
        
           | code_biologist wrote:
           | Why is that an important distinction? It's not clear to me
           | that it is.
        
           | Koshkin wrote:
           | Doesn't TeX emit a file like a compiler? You could say that a
           | PDF reader, say, is an "interpreter," but TeX itself? I don't
           | think so.
        
             | bonzini wrote:
             | A compiler generates a file that can be executed to produce
             | the same computation performed by the original source.
             | Since loops and conditionals(*) aren't apparent in TeX's
             | output, but are performed entirely while TeX runs, it is an
             | interpreter.
             | 
             | Regarding the output, TeX is an interpreter that produces
             | its output as a DVI or PDF file rather than as text on a
             | console or showing it in a GUI.
             | 
             | (*) Both those in the source, and those implicit in TeX's
             | typesetting algorithm.
        
               | SkeuomorphicBee wrote:
               | You mentioned loops in the output as a requirement for a
               | compiler, but I'm this case all compilers that unroll
               | loops are not compilers. Early shader compilers come to
               | mind.
        
           | ketralnis wrote:
           | I don't think that's either important or correct.
           | 
           | The biggest distinction I'd make between an interpreter and
           | compiler is whether you need it to be present at execution
           | time+, and if you use TeX to output PDF then obviously you
           | don't need TeX to be present to view the PDF.
           | 
           | Correctness aside here's what you're replying to:
           | 
           | > TeX is the macro language and the compiler, LaTeX is the
           | library.
           | 
           | What exactly here makes any compiler/interpreter distinction
           | important? How is the communication improved by this
           | "important" "distinction"?
           | 
           | +: Yes yes there are so many nitpicky "corrections" to make
           | here that, indeed, any "distinction" can approach
           | meaninglessness. If you're about to reply with that, read the
           | second bit.
        
           | PaulDavisThe1st wrote:
           | Some old sage: when you're relatively to new to computers,
           | you don't understand the difference between an interpreter
           | and a compiler. After a while, you see the difference, and
           | it's substantial and important. After a while longer, you
           | don't see the difference anymore, though for different
           | reasons than at first.
        
             | thrown_22 wrote:
             | I once build an interpreter for x86_64 to see what my
             | machine code was doing line by line. Then I read the GDB
             | manual.
        
             | lapinot wrote:
             | To make things a bit more clear (hope that's what you
             | meant!):
             | 
             | One concrete angle is that interpreters and compilers are
             | usually intertwined: interpreters commonly have an abstract
             | machine which is some language a bit simpler to execute
             | than the full language, thus they have a compilation step
             | at first. And compilers do stuff like constant propagation,
             | inlining and other flavours of partial evaluation. Thus
             | they contain some sorts of interpreters.
             | 
             | Another more abstract angle is to consider interpreters as
             | compilers into a "trivial" language consisting only of
             | values and side effects or the flip side: considering
             | compilers as interpreters with a non-standard semantic.
        
       | PaulHoule wrote:
       | nroff is probably a better comparison than LaTeX for this
       | 
       | https://en.wikipedia.org/wiki/Nroff
        
         | AlanYx wrote:
         | He's doing something inspired by Knuth-Plass linebreaking, so
         | either Heirloom Troff or Neatroff would be the better
         | comparison.
        
       | mhd wrote:
       | It's interesting to look at the early issues of the official TeX
       | publication Tugboat[1]. The first version of TeX, TeX78, written
       | in the SAIL language was just out and there was a concerted
       | effort to produce a more portable version in Pascal, TeX82 (IIRC
       | already known under that name during production).
       | 
       | The first issues of Tugboat then contained several reports about
       | early TeX being ported to various architectures. That often meant
       | rewriting the whole program, as basically no other system besides
       | Knuth's "native" time-sharing OS had SAIL. So there was a port to
       | a (Z80-based) Unix, someone rewriting parts in Fortran, etc. If
       | they were tied to a specific backend/printer, there wasn't even a
       | need to port Metafont.
       | 
       | TeX was relatively small. That often gets lost in contemporary
       | huge LaTeX distributions.
       | 
       | [1]: https://www.tug.org/tugboat/
        
       | seanwilson wrote:
       | > As you can easily tell, line breaks made at the beginning of
       | the chapter affect the potential line breaks you can do later.
       | Sometimes it is worth it to make a locally non-optimal choice at
       | the beginning to get a better line break possibility much later.
       | Evaluating a global metric like this can be potentially slow,
       | which is why interactive programs like LibreOffice do not use
       | this method.
       | 
       | Is there not a fast "good enough" algorithm for justifying text
       | vs a slow optimal one? Is it really infeasible for HTML to have
       | justified text that looks something like LaTeX? It's sad that
       | justified text still looks bad on the web with no signs of it
       | getting better even with all the computing power we have now.
        
         | chucky wrote:
         | I once discussed this with a coworker who had worked a lot on
         | various things related to typesetting, and he claimed the main
         | issue with doing good text justification for the web is the
         | lack of good, free hyphenation dictionaries (or rules) for most
         | languages. In order to do full paragraph justification in a way
         | that makes sense and always looks good you need to be able to
         | hyphenate words dynamically (and fairly aggressively).
         | 
         | LaTeX does come with hyphenation libraries for some languages,
         | but web browsers would need wider support. There's also the
         | question of how to standardize this, because you would have to
         | ship hyphenation dictionaries for every language, and you would
         | need to standardize this across browsers so they render pages
         | the same.
         | 
         | LaTeX does ship with hyphenation libraries for a bunch of
         | languages, but if you try to use it for any of the minor
         | languages, you'll find the results are so-so and you need to
         | manually hyphenate words to get decent results (this was my
         | experience a decade ago at least).
         | 
         | All of the above could be solved, but sadly I suspect it
         | wouldn't be worth the cost for the players that would have to
         | be involved in solving it.
        
           | bombcar wrote:
           | The biggest advantage LaTeX has is that the page width stays
           | the same between various versions of your document, so you
           | can improve it on the fly when you notice issues.
           | 
           | The web tries to support every possible page width so some
           | won't look great.
        
         | tannhaeuser wrote:
         | According to previous discussions [1], none that'd fit the
         | constraints of text rendering in browsers AIU (eg dynamic
         | programming approaches considered infeasible on the web,
         | hyphenation not really a thing). But who knows, CSS is
         | incredibly bloated already ...
         | 
         | [1]: https://news.ycombinator.com/item?id=28537923
        
         | tomsmeding wrote:
         | > > [...] Evaluating a global metric like this can be
         | potentially slow, which is why interactive programs like
         | LibreOffice do not use this method.
         | 
         | I suspect that another reason for a WYSIWYG office suite to not
         | use such global optimisation, is to avoid unpredictable changes
         | to layout in response to seemingly unrelated changes by the
         | user. If you get full reflows with figures moving around and
         | headings flipping between the previous and the next page while
         | you're typing a sentence, users will probably quickly punch a
         | hole in their monitor.
         | 
         | In (La)TeX this is less of an issue because of the delayed
         | compilation, I guess. Also *TeX users are probably more
         | tolerant to that kind of mess.
        
         | drfuchs wrote:
         | On my 2015 Mac, using just a single core, TeX composes the
         | entire 495-page TeXbook in under 0.2 seconds, or about 0.4
         | milliseconds per page. That includes reading the 1.3Mb input
         | file, evaluating all macros and such, doing all line-breaking
         | and page-breaking, and writing the entire 1.9Mb dvi file.
         | 
         | So, I don't know where the "too slow" meme is coming from. Is
         | there a browser that can open and scroll to the end of an HTML
         | page that's 500 screen-fulls long in 0.2 seconds?
        
       | marcodiego wrote:
       | Posted 3 days ago: https://news.ycombinator.com/item?id=32307185
        
       | groby_b wrote:
       | "The output contains some typographical niceties, so we can
       | reasonably say that the code does contain an implementation of a
       | very limited and basic form of LaTeX" has to be on the more
       | optimistic side of overstatements.
       | 
       | It's roughly akin to "cat can actually display content quite
       | similar to a <pre> block, so it contains a very limited and basic
       | form of a web browser".
       | 
       | It's not entirely wrong. But mostly because it left "wrong"
       | behind long ago and is in a different country now.
        
       | _gabe_ wrote:
       | I thought one of the reasons Knuth made TeX was because there
       | were no good math typesetters available. I have no idea if this
       | assumption is correct, but one of the main reasons I've resorted
       | to using LaTeX in the past is for the math typesetting. I haven't
       | found any good alternatives, and it seems incredibly complex
       | compared to basic kerning + line breaks and/or justification for
       | regular English-like text layout.
       | 
       | So if this implementation lacks math typesetting, and that was
       | one of the motivating factors behind the original TeX
       | implementations, then I think this code is missing a pretty
       | complex core feature.
        
         | cpach wrote:
         | That was indeed one of Knuth's primary goals. But perhaps it
         | wasn't one of Pakkanen's goals though.
        
           | _gabe_ wrote:
           | From the authors original post, it looks like you're right in
           | saying that was not one of his goals.
           | 
           | This article says:
           | 
           | > Thus we can reasonably say that the code does contain an
           | implementation of a very limited and basic form of LaTeX.
           | 
           | Which is what I'm primarily debating. I feel like a basic
           | LaTeX implementation would at least attempt mathematical
           | typesetting, but that could be debated :)
        
         | mhd wrote:
         | I think TeX is just used eponymously for typesetters, as
         | runoff, scribe, lout etc. aren't as popular anymore, to put it
         | politely.
        
       ___________________________________________________________________
       (page generated 2022-08-04 23:00 UTC)