Subj : Re: Swedish characters
To   : borland.public.cpp.borlandcpp
From : maeder@glue.ch (Thomas Maeder [TeamB])
Date : Wed Oct 08 2003 10:26 pm

"Taras Kentrschynskyj" <taras@syd.kth.se> writes:

> Here's the result:
> 
> cin >> s;                            // input: åäö
> cout << s[0] << s[1] << s[2] << endl;
> cout << (int)s[0] << (int)s[1] << (int)s[2] << endl;
> cout << 'å' << 'ä' << 'ö' << endl;
> cout << (int)'å' << (int)'ä' << (int)'ö' << endl;
> 
> 
> output(gcc): 
> åäö
> -27-28-10
> åäö
> -27-28-10

The Latin-1¹ codes of the three characters are 229, 228 and 246 respectively.
When these codes are converted to (signed) 8 bit char, the result is
-27 -28 -10. Programs created by your gcc installation seem to encode
characters read from standard input in that encoding; your text editor seems
to use this encoding as well.


> output(bcc): 
> åäö
> -122-124-108
> Õõ÷
> -27-28-10

The Codepage 850² codes of the three characters are 132, 134 and 148
respectively. When these codes are converted to (signed) 8 bit char, the
result is -122 -124 -108.

The codes 229, 228 and 246 encode the characters Õ, õ and ÷ respectively
in the Codepage 850. When these codes are converted to (signed) 8 bit char,
the result is -27 -28 -10.


You are working with three tools:
- gcc
- bcc
- a text editor

The text editor and the program generated by gcc seem to use Latin-1, while
the program generated by bcc uses Codepage 850. Since the character literals
are Latin-1 encoded, they aren't correctly treated.

As an easy test for this reasoning, you could edit the source file using a
hex editor. Change the byte representing the character literal ä from
0xE4 to 0x84. The program compiled from this modified source should then
write the character literal ä as ä.


¹ http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html
² http://www.kostis.net/charsets/cp850.htm

.