* <<G15.1092>> Good lord, where have my meditations brought me?
Reading the subtext of my last couple entries, it's becoming clear to 
me that I am actually suggesting that semantically tagged text, with 
those tags interleaved in the text stream, is preferable to the 
semantically barren chaos that is plain-text. That is to say, I 
believe something like HTML is *better* for the recording of human 
language than plain-text. A shocking turn of events since I have 
cited Nelson's "Embedded Markup Considered Harmful" more than a few 
times, and been in support of it.

The fact is, plain-text is full of explicit semantic markup already 
in the form of punctuation, and it's full of implicit semantic markup 
in the form of space.

One of the troubles with punctuation is that many important marks 
have been tasked with multiple semantic functions over the centuries. 
 The period, for example, in the context of a sentence may be an 
end-of-sentence signifier or an abbreviation marker, and when 
embedded inside a number it indicates the decimal point; the single 
quotation mark is infamously double-tasked with the function of an 
apostrophe in addition to its nominal purpose; numerous characters 
are used for different functions when part of a mathematical 
expression, and so-on.

Despite these problems with the explicit semantics of punctuation, it 
is a far greater problem to deal with the implicit semantics of 
space.  In the absence of explicit markers for them(1), paragraphs 
must be implicitly indicated by the presence of empty space. This may 
be blank lines above and below the paragraph, or it may be an 
indentation of the first line. Likewise, section headings and titles 
are indicated by the presence of empty visual space.

When writing in plain-text, authors are constantly devising their own 
idiosyncratic conventions of heading placement, section breaks, 
asterisms, and so-on.  Human readers can, of course, work-out the 
meanings of these inventions, but software isn't necessarily so 
clever.   Is it important to the author that his asterism is one 
asterisk in the center of the page, or three asterisks separated by 
spaces at the beginning of a line?  Or is it simply important that 
there be an asterism?  If the latter, shouldn't there be a code?

To some extent, ambiguities could be alleviated by the use of 
typographical marks to make the implicit explicit: stick a pilcrow at 
the beginning of every paragraph, a section-sign at the beginning of 
every heading, use an ellipsis character instead of three periods, 
use an asterism character instead of asterisks, and so-on.  If one 
were to do this, the whitespace in a plain-text document could be 
collapsed, and the structure of the document would be retained – 
unless, that is, the author wanted to put a pilcrow in the middle of 
a paragraph illustratively: ¶.  What about that, then?  That is why 
you'd need control codes, or markers, or tokens, or whatever you want 
to call them: "characters" that are not meant to be printed as such, 
but which are intended to trigger behaviours in the software 
rendering/processing the document.

The thrust of what needs to be said about "plain" text is this: in 
order for true plaintext, in order to escape the reality of 
plain-text as a teletypewriter operation language, we need to 
eradicate any assumption of a PAGE.  Text is a STREAM, it is 
one-dimensional – Plaintext needs to be readable on a single-line 
LCD display, it needs to work on screen readers for the blind or 
deaf-blind,…

On the space character itself: its most common and important function 
is as word-separator. But spaces also separate sentences, clauses, 
and other structures, despite this being the explicit function of 
other characters such as comma, parenthesis, semicolon, and terminal 
punctuation.
  
--
Excerpted from:

PUBLIC NOTES (G) 
http://alph.laemeur.com/txt/PUBNOTES-G 
©2016 Adam C. Moore (LÆMEUR) <adam@laemeur.com>