Subj : About Unicode
To   : netscape.public.mozilla.jseng
From : Jun Kim
Date : Mon Nov 22 2004 11:37 am

Hi. I have a question about unicode.
Well, I don't know whether I can ask this question here,
but I'll ask anyway. Any guideline will be appreciated.
I'm using the unicode language, preferably Korean.
And I thought JS engine support unicode string, but apparently not.
(or maybe I'm doing it wrong)

Well, this is the piece of the codes:

  js> var a = "가나다";      // the string is unicode
  js> a.charAt(1);            // this returns a trash character

The problem I met was calling charAt() function on unicode string.
I did the trace on the engine, and when charAt() function is called,
the JSString is converted to JSDependentString.
(Am I right?)
And the function that returns the char * from JSString,
it does all the #define functions and gets the address of the actual
string memory.
And the directed memory location is -1 to the character that actually 
located using ascii.
For instance, if the string is "abc", then the memory allocated looks 
like

61 00 62 00 63 00 00 00     // a.b.c..

and the charAt function calls with 1 as the index and when it returns,
the data points to the '00' right before 'b'

61 00 62 00 63 00 00 00     // a.b.c..
   ^

and so 'b' prints out.
well, using unicode, the memory is allocated 4 bytes become one unicode 
character, like this.

B1 00 C1 00 8F 00 DA 00 00 00    

where B1C1 is one unicode character and 8FDA is other one unicode 
character. (I just made the code up, so I don't know what they actually 
represent. :)  )
Well, here comes the problem. When calling charAt(1) function and 
converting things pass by, the string points to '00' right before C1

B1 00 C1 00 8F 00 DA 00 00 00    
   ^

so the output will be printing C1 code or maybe C18F code.
(I don't know exactly)

So, now this is where I call help. HELP!!! :)
Am I doing something wrong?

.