Subj : Re: SpiderMonkey: JS_InitStandardClasses allways fails To : =?ISO-8859-1?Q?Georg_Maa=DF?= From : Brendan Eich Date : Tue May 13 2003 02:28 pm [snip] > How can I examin, whether there are unicode characters inside the > JSString, that use a high byte? Is there an api call to test this, or > should I look at each character to determin whether I can use the > implementattion above without information loss, or should prefer a > std::wstring as container to prevent the highbytes to get lost. The last sounds best to me -- why risk information loss. Whether to waste cycles trying to optimize for ISO-Latin-1 or ASCII is something you'll have to consider, but it seems best to me, especially if you are writing for a worldwide audience, to use Unicode always and take the space hit. The engine already has taken that hit. > How can > I fill std::wstring? A jschar is only 2 bytes, where wchar_t is 4 bytes > or more. Not always. Some gcc's let you use 2-byte wchars. The question is how to transcode the ECMA-262 specified code point in each jschar into a wchar_t. ECMA and SpiderMonkey do not do anything special about Unicode characters that don't fit in the first plane; such characters will require more than one jschar, and will result in overlong string lengths. > So a jschar* is not binary compatible with a wcha_tr*. Do I > have to feed a std::wstring jschar by jschar as done in my > implementation above? I don't know, what kind of operating system are you using? wchar_t varies by OS. > I guess that there is no api function to find out whether > JS_GetStringBytes results in an information loss or not, because see no > internal flags inside JSString, which might provide this information in > a cheap way. Getting this information by looking into each jschar is > very expensive. Why bother? > This knowledge is necessary, when my Wert class instance is of type > "undefined", which means autocast the next assiged value to the type > that best fits. A JSString is ambiguos for this. If it contains > characters larger than 255, then the resulting typ must be Wert_wstring. > If not, then a temporary std::string is to be created and introspeced > whether it might be represended as int, unsigned int, long, unsigned > long, bool or date or otherwise must be stored as std::string. This auto > cast might be very expensive, if there is no cheap test to get the > information whether the JSString contains characters greater than 255 or > not. You are defining a type system with ambiguities. Why not always use wstring? > What is about byte order? Are there any situations where I have to > change the byte order, when I assign a jschar to a wchar_t, or does > the byte order of jschar allways fit the byteorder of wchar_t? On my > test system (x86) it fits, but does it also fit on any other system > like PowerPC without changing the byte order? Byte order among integral types in the same process does not differ on any architecture I know of (PDP-11 not included). The byte order of a short (jschar is unsigned short on all platforms) is the same as the byte order of a long (wchar_t may be an unsigned long -- but not on all OSes and with all compilation flags!). /be .