Subj : BBS Promotion To : Nicholas Boel From : mark lewis Date : Fri Feb 10 2017 08:35 pm On 2017 Feb 10 07:32:52, you wrote to me: NB>>> TimEd is probably trying to convert the UTF-8 Russian characters to NB>>> IBMPC, which won't happen. ml>> FWIW: there is no ""conversion""... it is simply displaying the ml>> glyphs represented by those raw bytes in their CP437 codepage ml>> positions... CP437 and other old-school codepage characters are only ml>> one byte wide... any ""conversion"" might come from translating ml>> between single byte codepages where the character glyph is ml>> transliterated from one position in the first codepage to another ml>> position in the second codepage where its glyph is stored... in that ml>> case, the raw byte changes because the position in the codepage ml>> changed and the byte is the position... NB> You say potato, etc.. yes and no... it is really easy to understand though... NB> Fact of the matter is CP437/IBMPC will not display Russian characters NB> properly, of course not... their glyphs are different than latin glyphs... this is really simple when looking at the old school way... there are numerous tables of 256 bytes... each byte represents one character, a glyph... some are actually control characters (eg: CR, LF) and others are just language characters aka glyphs... in one table, the space character is held in position 32decimal (aka 20hex)... another table also has the space in position 32decimal (aka 20hex)... great! no ""conversion"" is needed for the space character... now, if the capital letter 'A' is held in the first table at position 65decimal (aka 41hex) and the capital letter 'A' is held in position 25 decimal (aka 19hex) in the second table then some ""conversion"" is needed or you will see the wrong character when using one of the two pages... one will be right and the other just won't be... this is actually transliteration... there are mapping files created to point to the proper position for the 'A' when using the second table (aka codepage)... this is easily seen when overlaying CP855 on top of CP437... most characters will align in the same cells of the table but some are different... they are generally up in the higher-than-127 range where the line drawing and box characters reside in CP437... then someone came along and said "hey! we can do better" so UTF-8, UTF-16 and UTF-32 were born... UTF-8 is 8bit lossless and contains 1112064 positions in its table instead of the original 256... converting from codepages to UTF-8 is easy because every character exists in its huge table... going the other way is not guaranteed because the glyphs just don't all map over... in some languages, they have used "double characters" like "ae" to indicate the single ae character which i don't know how to make on this OS... other languages may also have an "ae" character but in them you cannot use "a" and "e" side by side to indicate the single "ae" character... i don't know why, that's just the way it is... anyway, i'm just trying to help you understand why there's no ""conversion"" as such in the old school code pages... there is transliteration where on glyph lives in one spot in this table and another spot in that table... UTF stuff just greatly expands the size of the tables which means that the glyphs are now represented by one or more bytes which are/were the old table position numbers in the old school code pages... NB> whether they're UTF-8 or not. true... NB> The only somewhat possible way for him to read it properly would be to NB> change his default encoding to CP866 or KOI8-R, eaxctly... NB> and even then there is no guarantee that the translation from UTF-8 NB> will work as expected. because it depends also on what his OS can display... what i mean by this is that he has to be able to load the OS with the needed code page to view them correctly but if he does that, he'll lose all the normal latin glyphs... switching to UTF-8 on the OS will alleviate this but it requires that the software is also able to transliterate the characters to their new positions in the UTF-8 table so they can be rendered properly... we've seen this with the box and line drawing characters... there's one or two BBS related packages out there that do properly transliterate them to their new positions in the UTF-8 table... i don't recall who did them or what packages they are/were but they are or have been participants in AGORANET and at least one of them was either a BBS or a terminal program... so, ok... too long a day... only 20:30 here and i'm already going to call it a night... on a friday damned night at that :( )\/(ark Always Mount a Scratch Monkey Do you manage your own servers? If you are not running an IDS/IPS yer doin' it wrong... .... Well done! is better than well said! --- * Origin: (1:3634/12.73) .