Subj : Re: Changes in golded+ sources
To   : Nicholas Boel
From : Vitaliy Aksyonov
Date : Wed Nov 08 2023 11:50 am

Hello Nicholas.

07 Nov 23 16:31, you wrote to me:

 VA>> Git is somewhat complex, but once you got the idea - you may do a
 VA>> lot of cool things which other code versioning systems cant. Like
 VA>> reorder commits, move one branch on top of another, etc.
 NB> I probably won't ever be reordering or anything super technical, but I
 NB> googled the easiest way, and 'checkout' seemed to do the trick. Then I
 NB> just went back to the master branch using the same option.

You just don't need it which is totally fine. :) And I'm just saying that git is very powerful tool if you know how to use it.

 VA>> If you need some help with that - I'll be glad to do it. It's
 VA>> just not right echo area for those questions. You may shoot me
 VA>> netmail too.
 NB> I may have to take you up on that offer some day. Thank you! ;)

Sure.

 VA>> You may convert any charset to UTF-8 actually. And it would be
 VA>> really cool to have UTF-8 support in GoldEd. I'm still learning
 VA>> the code. Will try to improve things in that area.

 NB> So here's a question or two for you. As of right now I'm using:

 NB> xlatimport cp437 (because most incoming messages missing a CHRS kludge
 NB> falls under this)

 NB> xlatexport utf-8 (because that's what I can write with)
 NB> xlatlocalset utf-8 (because this is my local setup, 'locale' gives
 NB> en_US.UTF-8 for everything)

 NB> Basically I'm forcing the use of utf-8 when exporting messages, but we
 NB> have already witnessed that that doesn't work when you write to me
 NB> using CP866 and I reply back to you.

 NB> 1) Is there a way for me to reply to you with the same charset (or
 NB> closest translation) that you're using automatically? Or would I have
 NB> to change my config file every time I reply to a different CHRS
 NB> kludge?

Not automatic, and there is a feature request to do this. I may work on this request when finish charset conversion refactoring.

But! You can do it manually. You may create separate message template and add in the beginning @XLatExport CP866 (or other). It's way inconvenient, but it works. This way I use US-ASCII while my standard export charset is CP866 (I mostly write to Russian echoes). You may switch templates when write new messages or answer to somebody.

So if you have only few desired charsets - it will work for you.

 NB> 2) Is this where iconv support would be beneficial, when an incoming
 NB> message has a CP866 kludge, iconv would translate it to UTF-8 on my
 NB> end so that it is readable and writable, and then translate it back to
 NB> CP866 when the reply is sent? Or would it still stay UTF-8 because I'm
 NB> forcing it on export?

Let me explain how this machinery works and then you'll have less questions if any. :)

For charset translation GoldEd uses three main keywords:
XlatImport - which charset to use if message doesn't have CHRS kludge.
XlatLocalSet - your local charset.
XlatExportSet - which craset to use to write! message. This is important.

To convert between each pair you need to have translation tables configured.

Now lets go through some scenario.

XlatImport cp437
XlatLocalSet utf-8
XlatExport utf-8

1. You receive message with CHRS CP866 2.
2. Ignore XlatImport because we have CHRS kludge. Use CP866.
3. Lookup conversion table between CP866 and UTF-8. If it exists - great, use it and convert message to UTF-8 (and you even won't lose any letters because all symbols from CP866 may be converted to UTF-8.

That's for reading part. Now you decide to answer.

Editor will work in UTF-8 and converted message will be displayed fine (there are some issues with UTF-8 there, but let's skip it for now).

1. You write some stuff using English, Russian and let's say Arab letters.
2. When you save it to message base, GoldEd will lookup conversion table from UTF-8 (XLatLocalSet) to UTF-8 (XlatExport). As long as you have it, even if it's really fake, but in this case it works because that's one-to-one conversion, GoldEd will save message perfectly fine in UTF-8.

The problem with this approach is that people which have different local charset (like me) won't see non-english chars.

Now let's imagine that you have XlatExport set to cp437, but XlatLocalSet to utf-8.

When message is being saved, GoldEd would try to find conversion table from utf-8 and cp437. Such table in GoldEd just impossible, because it cannot convert from multibyte charsets to other charsets. So it will just save it as UTF-8 text. I'm not sure which CHRS kludge will be used, but that's very easy to check. :)

Even more. When it converts from one-byte to multibyte charset - it can only use up to three bytes per symbol. Which is enough for most languages, but still is a limitation.

Would iconv improve this situation - yes! You'll be able to convert from multibyte charsets. Sure, some symbols may be lost if they don't exist in target charset. For example if I convert from UTF-8 to KOI8-R, I'd lose German umlauts. :)

Hope that answers your question. If you have more - I'll try to answer them.

 VA>>>> I see. And you use that fake conversion table from/to UTF-8,
 VA>>>> right? If you do - my next small fix will help you for sure.

 NB> We can consider this message my first test using the latest commit. ;)

 NB> Hope it works!

It works. Now I see "UTF-8 4".

Vitaliy

.... ?u?a??e ?e??y c?po? - ?a? ?u?o??a ?e 6??ae? o?e?a?o?.
--- GoldED+/LNX 1.1.5-b20231030
 * Origin: Aurora, Colorado (1:104/117)

.