Subj : BBS Promotion
To   : Nicholas Boel
From : mark lewis
Date : Fri Feb 10 2017 08:35 pm


 On 2017 Feb 10 07:32:52, you wrote to me:

 NB>>> TimEd is probably trying to convert the UTF-8 Russian characters to
 NB>>> IBMPC, which won't happen.

 ml>> FWIW: there is no ""conversion""... it is simply displaying the
 ml>> glyphs represented by those raw bytes in their CP437 codepage
 ml>> positions... CP437 and other old-school codepage characters are only
 ml>> one byte wide... any ""conversion"" might come from translating
 ml>> between single byte codepages where the character glyph is
 ml>> transliterated from one position in the first codepage to another
 ml>> position in the second codepage where its glyph is stored... in that
 ml>> case, the raw byte changes because the position in the codepage
 ml>> changed and the byte is the position...

 NB> You say potato, etc..

yes and no... it is really easy to understand though...

 NB> Fact of the matter is CP437/IBMPC will not display Russian characters
 NB> properly,

of course not... their glyphs are different than latin glyphs... this is really 
simple when looking at the old school way... there are numerous tables of 256 
bytes... each byte represents one character, a glyph... some are actually 
control characters (eg: CR, LF) and others are just language characters aka 
glyphs... in one table, the space character is held in position 32decimal (aka 
20hex)... another table also has the space in position 32decimal (aka 20hex)... 
great! no ""conversion"" is needed for the space character... now, if the 
capital letter 'A' is held in the first table at position 65decimal (aka 41hex) 
and the capital letter 'A' is held in position 25 decimal (aka 19hex) in the 
second table then some ""conversion"" is needed or you will see the wrong 
character when using one of the two pages... one will be right and the other 
just won't be... this is actually transliteration... there are mapping files 
created to point to the proper position for the 'A' when using the second table 
(aka codepage)... this is easily seen when overlaying CP855 on top of CP437... 
most characters will align in the same cells of the table but some are 
different... they are generally up in the higher-than-127 range where the line 
drawing and box characters reside in CP437...

then someone came along and said "hey! we can do better" so UTF-8, UTF-16 and 
UTF-32 were born... UTF-8 is 8bit lossless and contains 1112064 positions in 
its table instead of the original 256... converting from codepages to UTF-8 is 
easy because every character exists in its huge table... going the other way is 
not guaranteed because the glyphs just don't all map over... in some languages, 
they have used "double characters" like "ae" to indicate the single ae 
character which i don't know how to make on this OS... other languages may also 
have an "ae" character but in them you cannot use "a" and "e" side by side to 
indicate the single "ae" character... i don't know why, that's just the way it 
is...

anyway, i'm just trying to help you understand why there's no ""conversion"" as 
such in the old school code pages... there is transliteration where on glyph 
lives in one spot in this table and another spot in that table... UTF stuff 
just greatly expands the size of the tables which means that the glyphs are now 
represented by one or more bytes which are/were the old table position numbers 
in the old school code pages...

 NB> whether they're UTF-8 or not.

true...

 NB> The only somewhat possible way for him to read it properly would be to
 NB> change his default encoding to CP866 or KOI8-R,

eaxctly...

 NB> and even then there is no guarantee that the translation from UTF-8
 NB> will work as expected.

because it depends also on what his OS can display... what i mean by this is 
that he has to be able to load the OS with the needed code page to view them 
correctly but if he does that, he'll lose all the normal latin glyphs... 
switching to UTF-8 on the OS will alleviate this but it requires that the 
software is also able to transliterate the characters to their new positions in 
the UTF-8 table so they can be rendered properly... we've seen this with the 
box and line drawing characters... there's one or two BBS related packages out 
there that do properly transliterate them to their new positions in the UTF-8 
table... i don't recall who did them or what packages they are/were but they 
are or have been participants in AGORANET and at least one of them was either a 
BBS or a terminal program...

so, ok... too long a day... only 20:30 here and i'm already going to call it a 
night... on a friday damned night at that :(

)\/(ark

Always Mount a Scratch Monkey
Do you manage your own servers? If you are not running an IDS/IPS yer doin' it 
wrong...
.... Well done! is better than well said!
---
 * Origin:  (1:3634/12.73)

.