* <> Good lord, where have my meditations brought me? Reading the subtext of my last couple entries, it's becoming clear to me that I am actually suggesting that semantically tagged text, with those tags interleaved in the text stream, is preferable to the semantically barren chaos that is plain-text. That is to say, I believe something like HTML is *better* for the recording of human language than plain-text. A shocking turn of events since I have cited Nelson's "Embedded Markup Considered Harmful" more than a few times, and been in support of it. The fact is, plain-text is full of explicit semantic markup already in the form of punctuation, and it's full of implicit semantic markup in the form of space. One of the troubles with punctuation is that many important marks have been tasked with multiple semantic functions over the centuries. The period, for example, in the context of a sentence may be an end-of-sentence signifier or an abbreviation marker, and when embedded inside a number it indicates the decimal point; the single quotation mark is infamously double-tasked with the function of an apostrophe in addition to its nominal purpose; numerous characters are used for different functions when part of a mathematical expression, and so-on. Despite these problems with the explicit semantics of punctuation, it is a far greater problem to deal with the implicit semantics of space. In the absence of explicit markers for them(1), paragraphs must be implicitly indicated by the presence of empty space. This may be blank lines above and below the paragraph, or it may be an indentation of the first line. Likewise, section headings and titles are indicated by the presence of empty visual space. When writing in plain-text, authors are constantly devising their own idiosyncratic conventions of heading placement, section breaks, asterisms, and so-on. Human readers can, of course, work-out the meanings of these inventions, but software isn't necessarily so clever. Is it important to the author that his asterism is one asterisk in the center of the page, or three asterisks separated by spaces at the beginning of a line? Or is it simply important that there be an asterism? If the latter, shouldn't there be a code? To some extent, ambiguities could be alleviated by the use of typographical marks to make the implicit explicit: stick a pilcrow at the beginning of every paragraph, a section-sign at the beginning of every heading, use an ellipsis character instead of three periods, use an asterism character instead of asterisks, and so-on. If one were to do this, the whitespace in a plain-text document could be collapsed, and the structure of the document would be retained – unless, that is, the author wanted to put a pilcrow in the middle of a paragraph illustratively: ¶. What about that, then? That is why you'd need control codes, or markers, or tokens, or whatever you want to call them: "characters" that are not meant to be printed as such, but which are intended to trigger behaviours in the software rendering/processing the document. The thrust of what needs to be said about "plain" text is this: in order for true plaintext, in order to escape the reality of plain-text as a teletypewriter operation language, we need to eradicate any assumption of a PAGE. Text is a STREAM, it is one-dimensional – Plaintext needs to be readable on a single-line LCD display, it needs to work on screen readers for the blind or deaf-blind,… On the space character itself: its most common and important function is as word-separator. But spaces also separate sentences, clauses, and other structures, despite this being the explicit function of other characters such as comma, parenthesis, semicolon, and terminal punctuation. -- Excerpted from: PUBLIC NOTES (G) http://alph.laemeur.com/txt/PUBNOTES-G ©2016 Adam C. Moore (LÆMEUR)