Subj : Re: Swedish characters To : borland.public.cpp.borlandcpp From : maeder@glue.ch (Thomas Maeder [TeamB]) Date : Wed Oct 08 2003 10:26 pm "Taras Kentrschynskyj" writes: > Here's the result: > > cin >> s; // input: åäö > cout << s[0] << s[1] << s[2] << endl; > cout << (int)s[0] << (int)s[1] << (int)s[2] << endl; > cout << 'å' << 'ä' << 'ö' << endl; > cout << (int)'å' << (int)'ä' << (int)'ö' << endl; > > > output(gcc): > åäö > -27-28-10 > åäö > -27-28-10 The Latin-1¹ codes of the three characters are 229, 228 and 246 respectively. When these codes are converted to (signed) 8 bit char, the result is -27 -28 -10. Programs created by your gcc installation seem to encode characters read from standard input in that encoding; your text editor seems to use this encoding as well. > output(bcc): > åäö > -122-124-108 > Õõ÷ > -27-28-10 The Codepage 850² codes of the three characters are 132, 134 and 148 respectively. When these codes are converted to (signed) 8 bit char, the result is -122 -124 -108. The codes 229, 228 and 246 encode the characters Õ, õ and ÷ respectively in the Codepage 850. When these codes are converted to (signed) 8 bit char, the result is -27 -28 -10. You are working with three tools: - gcc - bcc - a text editor The text editor and the program generated by gcc seem to use Latin-1, while the program generated by bcc uses Codepage 850. Since the character literals are Latin-1 encoded, they aren't correctly treated. As an easy test for this reasoning, you could edit the source file using a hex editor. Change the byte representing the character literal ä from 0xE4 to 0x84. The program compiled from this modified source should then write the character literal ä as ä. ¹ http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html ² http://www.kostis.net/charsets/cp850.htm .