Typewriter typography ===================== In plain text documents, I like to use what I call "typewriter typography". By that I mean that I try to use ASCII characters wherever I can, even if I have to replace typographically correct characters. For example, I try to stick to straight quotes (") and apostrophes ('), double hyphens (--) instead of en or em dashes, three dots (...) for ellipses, a caret (^) for superscripts, and so on. But I usually don't replace letters. For example, I usually keep umlauts and vowels with accent marks, and encode the document in UTF-8. Benefits -------- 1. Some characters needed for fine typography are hard to enter into plain text documents. For example, if you write French or use numbers with unit symbols, you would actually need thin non-breaking spaces. Since I already make a compromise there, it feels more consistent to go all-in on "typewriter typography". 2. It saves bytes (at least in UTF-8). 3. It ensures consistency: you no longer have to distinguish between "66" and "99" quotes (or guillemets) if all quotes are straight quotes. 4. It makes for easier grepping. 5. It makes for safer copying and higher compatibility with fonts and programs (especially monospaced terminals). Theoretical background: Unicode equivalence ------------------------------------------- I recommend the following 2 pages: https://en.wikipedia.org/wiki/Unicode_equivalence https://www.unicode.org/reports/tr15/ In essence, there are sequences of Unicode characters that always have exactly the same meaning and appearance. They are called "canonically equivalent". For example, the single LATIN SMALL LETTER E WITH ACUTE (U+00E9) is canonically equivalent to a regular LATIN SMALL LETTER E (U+0065) followed by a COMBINING ACUTE ACCENT (U+0301). Other sequences of Unicode characters might have differences in meaning and appearance, but in some cases where they represent the same "abstract character(s)", their semantic difference becomes negligible. They are called "compatible" or "compatibility equivalent". For example, the single VULGAR FRACTION ONE QUARTER (U+00BC) is compatible with the sequence "1/4": they appear differently, and they have different meanings in some cases (e.g. where "1/4" is supposed to mean "one out of four", or some kind of reference number), but in cases where they only convey the meaning of "one quarter", the difference between them becomes negligible. This "compatibility equivalence" is the essence of "typewriter typography". The case of the man page title dash ----------------------------------- The title of a man page traditionally consists of the name of the program it documents, a dash surrounded by spaces, and a single-line description that starts with a lowercase letter. As man-pages(7) states: See man(7) for important details of the line(s) that should follow the .SH NAME command. All words in this line (including the word immediately following the "\-") should be in lowercase, except where English or technical terminological convention dictates otherwise. I have adopted this format for headings of README documents and for the first line of comment headers in scripts. My problem is that I am unsure how to represent the man page title dash in "typewriter typography". If it were an en or em dash, I would use two hyphens (--) like always. But it isn't. Instead, it appears as a single hyphen in monospaced terminal, which looks fine there. But in contexts without a monospaced typeface, single hyphens look ugly when (mis-)used as dashes -- it is to long to be seen as a mid-dot, but to short to be seen as a dash. That's why the man page title dash is replaced with a minus character in PDF output. As man-pages(7) states further down: Where a real minus character is required (e.g., for numbers such as -1, for man page cross references such as utf-8(7), or when writing options that have a leading dash, such as in ls -l), use the following form in the man page source: \- This guideline applies also to code examples. The use of real minus signs serves the following purposes: * To provide better renderings on various targets other than ASCII terminals, notably in PDF and on Unicode/UTF-8-capable terminals. * To generate glyphs that when copied from rendered pages will produce real minus signs when pasted into a terminal. I don't write roff code, but plain text. Plain text is meant to be read in the same format that it was written. So I don't get to choose a character depending on the rendering device. I have to make a choice. And for the reasons stated above -- the man page title dash isn't an en or em dash and it isn't rendered as two hyphens, but as a single hyphen in monospaced terminals -- I have decided to stick with a single hyphen. Of course, people will mistake me as typographically illiterate now if they see my program title lines rendered without a monospaced typeface. But I try to take consolation in the fact that I have already seen HTML renderings of man pages make the same choice, and in the attempt to see the single hyphen not as a hideous bastard between a mid-dot and a proper dash, but as a welcome middle way. Copyright (C) 2023 Daniel Kalak Licensed under CC-BY-ND-4.0 .