From sa2c@us.and.or.jp  Sat Jun 30 13:15:04 2001
Return-Path: <sa2c@us.and.or.jp>
Received: from berkeley.us.and.or.jp (berkeley.us.and.or.jp [210.136.4.34])
	by hub.freebsd.org (Postfix) with ESMTP id A15F237B403
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 30 Jun 2001 13:15:02 -0700 (PDT)
	(envelope-from sa2c@us.and.or.jp)
Received: by berkeley.us.and.or.jp (Postfix, from userid 3104)
	id D4C8F3E32; Sun,  1 Jul 2001 05:14:48 +0900 (JST)
Message-Id: <20010630201448.D4C8F3E32@berkeley.us.and.or.jp>
Date: Sun,  1 Jul 2001 05:14:48 +0900 (JST)
From: sa2c@and.or.jp
Reply-To: sa2c@and.or.jp
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: EUC support of wcstombs(3) is broken for codeset 3 and 4
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         28552
>Category:       bin
>Synopsis:       EUC support of wcstombs(3) is broken for codeset 3 and 4
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Jun 30 13:20:01 PDT 2001
>Closed-Date:    Mon Oct 07 13:39:18 PDT 2002
>Last-Modified:  Mon Oct 07 13:39:18 PDT 2002
>Originator:     NIIMI Satoshi
>Release:        FreeBSD 4.3-STABLE i386
>Organization:
>Environment:
System: FreeBSD berkeley.us.and.or.jp 4.3-STABLE FreeBSD 4.3-STABLE #2: Thu Jun 21 18:28:33 JST 2001     sa2c@berkeley.us.and.or.jp:/usr/obj/usr/src/sys/BERKELEY  i386

	
>Description:

wcstombs(3) converts wide characters to multibyte characters
incorrectly if the character is codeset 3 or codeset 4 of EUC
character.  The produced multibyte characters do not conform to EUC
specifition.

	
>How-To-Repeat:

#include <locale.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* multibyte Japanese EUC characters */
const unsigned char teststr[] = { 0x41, 0xa4, 0xa2, 0x8e, 0xb1,
	0x8f, 0xb0, 0xa1, 0 };

/* expected wide characters of above */
const wchar_t w_teststr[] = { 0x0041, 0xa4a2, 0x00b1, 0xb021, 0 };

void
dumpmbs(const char *prompt, const unsigned char *p)
{
	int c;

	printf("%s: ", prompt);
	do {
		c = *p++;
		printf("[%02x]", c);
	} while (c != 0);
	putchar('\n');
}

void
dumpwcs(const char *prompt, const wchar_t *wp)
{
	wchar_t wc;

	printf("%s: ", prompt);
	do {
		wc = *wp++;
		printf("[%04x]", wc);
	} while (wc != 0);
	putchar('\n');
}
	
int
main(int argc, char **argv)
{
	unsigned char buf[BUFSIZ];
	wchar_t wbuf[BUFSIZ];

	setlocale(LC_CTYPE, "ja_JP.EUC");

	strncpy(buf, teststr, sizeof(buf) - 1);
	buf[sizeof(buf) - 1] = '\0';
	
	dumpmbs("mbs", teststr);
	dumpwcs("wcs", w_teststr);

	mbstowcs(wbuf, teststr, BUFSIZ);
	dumpwcs("mbs->wcs", wbuf);

	wcstombs(buf, w_teststr, BUFSIZ);
	dumpmbs("wcs->mbs", buf);

	mbstowcs(wbuf, teststr, BUFSIZ);
	wcstombs(buf, wbuf, BUFSIZ);
	dumpmbs("mbs->wcs->mbs", buf);

	wcstombs(buf, w_teststr, BUFSIZ);
	mbstowcs(wbuf, buf, BUFSIZ);
	dumpwcs("wcs->mbs->wcs", wbuf);

	return 0;
}

	
>Fix:

Index: euc.c
===================================================================
RCS file: /home/ncvs/src/lib/libc/locale/euc.c,v
retrieving revision 1.3.6.1
diff -u -u -r1.3.6.1 euc.c
--- euc.c	2000/06/04 21:47:39	1.3.6.1
+++ euc.c	2001/06/30 19:47:16
@@ -123,6 +123,8 @@
 #define	_SS2	0x008e
 #define	_SS3	0x008f
 
+#define GR_BITS	0x80808080 /* XXX: to be fixed */
+
 static inline int
 _euc_set(c)
 	u_int c;
@@ -202,6 +204,8 @@
 				}
 				*string++ = _SS2;
 				--i;
+				/* SS2 designates G2 into GR */
+				nm |= GR_BITS;
 			} else
 				if (m == CEI->bits[3]) {
 					i = len = CEI->count[3];
@@ -212,6 +216,8 @@
 					}
 					*string++ = _SS3;
 					--i;
+					/* SS3 designates G3 into GR */
+					nm |= GR_BITS;
 				} else
 					goto CodeSet1;	/* Bletch */
 		while (i-- > 0)

	
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback 
State-Changed-By: asmodai 
State-Changed-When: Sun Apr 7 09:37:40 PDT 2002 
State-Changed-Why:  
Satoshi-san, 

do you have any pointers to the specifications for Japanese languages [in 
English please] so that I get a better understanding of the issues. 

I committed the fix in CURRENT.  Lets wait a while and see if anything goes 
wrong with this in place. 

Thanks. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=28552 

From: NIIMI Satoshi <sa2c@sa2c.net>
To: freebsd-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: bin/28552: EUC support of wcstombs(3) is broken for codeset 3 and 4
Date: 11 Aug 2002 06:47:43 +0900

 Please close this PR.
 Already committed as euc.c:1.7.
 
State-Changed-From-To: feedback->closed 
State-Changed-By: dwmalone 
State-Changed-When: Mon Oct 7 13:38:30 PDT 2002 
State-Changed-Why:  
Closed at submitter's request. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=28552 
>Unformatted:
  <synopsis of the problem (one line)>
