From jr@opal.com  Fri Jan 18 20:12:38 2013
Return-Path: <jr@opal.com>
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
	by hub.freebsd.org (Postfix) with ESMTP id D3E34D13
	for <FreeBSD-gnats-submit@freebsd.org>; Fri, 18 Jan 2013 20:12:38 +0000 (UTC)
	(envelope-from jr@opal.com)
Received: from mho-02-ewr.mailhop.org (mho-04-ewr.mailhop.org [204.13.248.74])
	by mx1.freebsd.org (Postfix) with ESMTP id 9B531373
	for <FreeBSD-gnats-submit@freebsd.org>; Fri, 18 Jan 2013 20:12:38 +0000 (UTC)
Received: from pool-141-154-249-240.bos.east.verizon.net ([141.154.249.240] helo=homobox.opal.com)
	by mho-02-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256)
	(Exim 4.72)
	(envelope-from <jr@opal.com>)
	id 1TwIIh-0008Hw-PX
	for FreeBSD-gnats-submit@freebsd.org; Fri, 18 Jan 2013 20:12:32 +0000
Received: from shibato.opal.com (shibato.opal.com [IPv6:2001:470:8cb8:4:221:63ff:fe5a:c9a7])
	(authenticated bits=0)
	by homobox.opal.com (8.14.4/8.14.4) with ESMTP id r0IKCUHi002879
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <FreeBSD-gnats-submit@freebsd.org>; Fri, 18 Jan 2013 15:12:30 -0500 (EST)
	(envelope-from jr@opal.com)
Received: from shibato.opal.com (localhost [127.0.0.1])
	by shibato.opal.com (8.14.5/8.14.5) with ESMTP id r0IKCU0n077302
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO)
	for <FreeBSD-gnats-submit@freebsd.org>; Fri, 18 Jan 2013 15:12:30 -0500 (EST)
	(envelope-from jr@opal.com)
Received: (from jr@localhost)
	by shibato.opal.com (8.14.5/8.14.5/Submit) id r0IKCUiE077301;
	Fri, 18 Jan 2013 15:12:30 -0500 (EST)
	(envelope-from jr)
Message-Id: <201301182012.r0IKCUiE077301@shibato.opal.com>
Date: Fri, 18 Jan 2013 15:12:30 -0500 (EST)
From: "J.R. Oldroyd" <fbsd@opal.com>
Reply-To: "J.R. Oldroyd" <fbsd@opal.com>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: update vis(3) and vis(1) to support multibyte characters
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         175418
>Category:       bin
>Synopsis:       [patch] update vis(3) and vis(1) to support multibyte characters
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    brooks
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Jan 18 20:20:00 UTC 2013
>Closed-Date:    Tue Apr 16 19:32:30 UTC 2013
>Last-Modified:  Tue Apr 16 19:32:30 UTC 2013
>Originator:     J.R. Oldroyd
>Release:        FreeBSD 9.1-RELEASE amd64
>Organization:
>Environment:
System: FreeBSD xx.opal.com 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r244985: Tue Jan 8 10:51:13 EST 2013 xx@shibato.opal.com:/usr/src/sys/amd64/compile/GENERIC amd64
>Description:
The vis(3) library calls and the vis(1) program do not support multibyte
character sets.  As a result many printable characters are not displayed
properly and vice-versa.  This patch enhances vis(3) to support multibyte
characters according to the setting of LC_CTYPE and also adjusts vis(1)
so that it reads input in multibyte aware manner.

Since vis(3) is also used by ps(1), this patch fixes ps(1) so that wide
characters in command arguments are displayed properly.
>How-To-Repeat:
n/a
>Fix:
--- lib/libc/gen/vis.c.orig	2013-01-02 19:26:41.000000000 -0500
+++ lib/libc/gen/vis.c	2013-01-17 14:45:55.000000000 -0500
@@ -35,167 +35,233 @@
 
 #include <sys/types.h>
 #include <limits.h>
+#include <stdlib.h>
+#include <wchar.h>
+#include <wctype.h>
+#include <string.h>
 #include <ctype.h>
 #include <stdio.h>
 #include <vis.h>
 
-#define	isoctal(c)	(((u_char)(c)) >= '0' && ((u_char)(c)) <= '7')
+#define	iswoctal(c)	(((u_char)(c)) >= L'0' && ((u_char)(c)) <= L'7')
 
 /*
- * vis - visually encode characters
+ * _vis - visually encode wide characters
  */
-char *
-vis(dst, c, flag, nextc)
-	char *dst;
-	int c, nextc;
+wchar_t *
+_vis(dst, c, flag, nextc)
+	wchar_t *dst;
+	wint_t c, nextc;
 	int flag;
 {
-	c = (unsigned char)c;
-
 	if (flag & VIS_HTTPSTYLE) {
 		/* Described in RFC 1808 */
-		if (!(isalnum(c) /* alpha-numeric */
+		if (!(iswalnum(c) /* alpha-numeric */
 		    /* safe */
-		    || c == '$' || c == '-' || c == '_' || c == '.' || c == '+'
+		    || c == L'$' || c == L'-' || c == L'_' || c == L'.' || c == L'+'
 		    /* extra */
-		    || c == '!' || c == '*' || c == '\'' || c == '('
-		    || c == ')' || c == ',')) {
-			*dst++ = '%';
-			snprintf(dst, 4, (c < 16 ? "0%X" : "%X"), c);
+		    || c == L'!' || c == L'*' || c == L'\'' || c == L'('
+		    || c == L')' || c == L',')) {
+			*dst++ = L'%';
+			swprintf(dst, 4, (c < 16 ? L"0%X" : L"%X"), c);
 			dst += 2;
 			goto done;
 		}
 	}
 
 	if ((flag & VIS_GLOB) &&
-	    (c == '*' || c == '?' || c == '[' || c == '#'))
+	    (c == L'*' || c == L'?' || c == L'[' || c == L'#'))
 		;
-	else if (isgraph(c) ||
-	   ((flag & VIS_SP) == 0 && c == ' ') ||
-	   ((flag & VIS_TAB) == 0 && c == '\t') ||
-	   ((flag & VIS_NL) == 0 && c == '\n') ||
-	   ((flag & VIS_SAFE) && (c == '\b' || c == '\007' || c == '\r'))) {
+	else if (iswgraph(c) ||
+	   ((flag & VIS_SP) == 0 && c == L' ') ||
+	   ((flag & VIS_TAB) == 0 && c == L'\t') ||
+	   ((flag & VIS_NL) == 0 && c == L'\n') ||
+	   ((flag & VIS_SAFE) && (c == L'\b' || c == L'\007' || c == L'\r'))) {
 		*dst++ = c;
-		if (c == '\\' && (flag & VIS_NOSLASH) == 0)
-			*dst++ = '\\';
-		*dst = '\0';
-		return (dst);
+		if (c == L'\\' && (flag & VIS_NOSLASH) == 0)
+			*dst++ = L'\\';
+		goto done;
 	}
 
 	if (flag & VIS_CSTYLE) {
 		switch(c) {
-		case '\n':
-			*dst++ = '\\';
-			*dst++ = 'n';
-			goto done;
-		case '\r':
-			*dst++ = '\\';
-			*dst++ = 'r';
-			goto done;
-		case '\b':
-			*dst++ = '\\';
-			*dst++ = 'b';
-			goto done;
-		case '\a':
-			*dst++ = '\\';
-			*dst++ = 'a';
-			goto done;
-		case '\v':
-			*dst++ = '\\';
-			*dst++ = 'v';
-			goto done;
-		case '\t':
-			*dst++ = '\\';
-			*dst++ = 't';
-			goto done;
-		case '\f':
-			*dst++ = '\\';
-			*dst++ = 'f';
-			goto done;
-		case ' ':
-			*dst++ = '\\';
-			*dst++ = 's';
-			goto done;
-		case '\0':
-			*dst++ = '\\';
-			*dst++ = '0';
-			if (isoctal(nextc)) {
-				*dst++ = '0';
-				*dst++ = '0';
+		case L'\n':
+			*dst++ = L'\\';
+			*dst++ = L'n';
+			goto done;
+		case L'\r':
+			*dst++ = L'\\';
+			*dst++ = L'r';
+			goto done;
+		case L'\b':
+			*dst++ = L'\\';
+			*dst++ = L'b';
+			goto done;
+		case L'\a':
+			*dst++ = L'\\';
+			*dst++ = L'a';
+			goto done;
+		case L'\v':
+			*dst++ = L'\\';
+			*dst++ = L'v';
+			goto done;
+		case L'\t':
+			*dst++ = L'\\';
+			*dst++ = L't';
+			goto done;
+		case L'\f':
+			*dst++ = L'\\';
+			*dst++ = L'f';
+			goto done;
+		case L' ':
+			*dst++ = L'\\';
+			*dst++ = L's';
+			goto done;
+		case L'\0':
+			*dst++ = L'\\';
+			*dst++ = L'0';
+			if (iswoctal(nextc)) {
+				*dst++ = L'0';
+				*dst++ = L'0';
 			}
 			goto done;
 		}
 	}
-	if (((c & 0177) == ' ') || isgraph(c) || (flag & VIS_OCTAL)) {
-		*dst++ = '\\';
-		*dst++ = ((u_char)c >> 6 & 07) + '0';
-		*dst++ = ((u_char)c >> 3 & 07) + '0';
-		*dst++ = ((u_char)c & 07) + '0';
+	if (((c & 0177) == L' ') || (flag & VIS_OCTAL)) {
+		*dst++ = L'\\';
+		*dst++ = ((u_char)c >> 6 & 07) + L'0';
+		*dst++ = ((u_char)c >> 3 & 07) + L'0';
+		*dst++ = ((u_char)c & 07) + L'0';
 		goto done;
 	}
 	if ((flag & VIS_NOSLASH) == 0)
-		*dst++ = '\\';
+		*dst++ = L'\\';
 	if (c & 0200) {
 		c &= 0177;
-		*dst++ = 'M';
+		*dst++ = L'M';
 	}
-	if (iscntrl(c)) {
-		*dst++ = '^';
+	if (iswcntrl(c)) {
+		*dst++ = L'^';
 		if (c == 0177)
-			*dst++ = '?';
+			*dst++ = L'?';
 		else
-			*dst++ = c + '@';
+			*dst++ = c + L'@';
 	} else {
-		*dst++ = '-';
+		*dst++ = L'-';
 		*dst++ = c;
 	}
 done:
-	*dst = '\0';
+	*dst = L'\0';
 	return (dst);
 }
 
 /*
+ * vis - visually encode characters
+ */
+char *
+vis(dst, c, flag, nextc)
+	char *dst;
+	int c, nextc;
+	int flag;
+{
+	/*
+	 * Output may be up to 4 times the size of input plus
+	 * 1 for the NUL.
+	 */
+	wchar_t res[5];
+
+	_vis(res, (wint_t) c, flag, (wint_t) nextc);
+	wcstombs(dst, res, wcslen(res)+sizeof(wchar_t));
+	return (dst + strlen(dst));
+}
+
+/*
  * strvis, strvisx - visually encode characters from src into dst
  *
  *	Dst must be 4 times the size of src to account for possible
  *	expansion.  The length of dst, not including the trailing NUL,
  *	is returned.
  *
- *	Strvisx encodes exactly len bytes from src into dst.
+ *	Strvisx encodes exactly len characters from src into dst.
  *	This is useful for encoding a block of data.
  */
 int
-strvis(dst, src, flag)
-	char *dst;
-	const char *src;
+strvis(mbdst, mbsrc, flag)
+	char *mbdst;
+	const char *mbsrc;
 	int flag;
 {
-	char c;
-	char *start;
+	wchar_t *dst, *src;
+	wchar_t *pdst, *psrc;
+	wchar_t c;
+	wchar_t *start;
+
+	if ((psrc = (wchar_t *) calloc((strlen(mbsrc) + 1),
+	    sizeof(wchar_t))) == NULL)
+		return -1;
+	if ((pdst = (wchar_t *) calloc(((4 * strlen(mbsrc)) + 1),
+	    sizeof(wchar_t))) == NULL) {
+		free((void *) psrc);
+		return -1;
+	}
+
+	dst = pdst;
+	src = psrc;
+
+	mbstowcs(src, mbsrc, strlen(mbsrc) + 1);
 
 	for (start = dst; (c = *src); )
-		dst = vis(dst, c, flag, *++src);
-	*dst = '\0';
+		dst = _vis(dst, c, flag, *++src);
+
+	wcstombs(mbdst, start, dst - start + sizeof(wchar_t));
+
+	free((void *) pdst);
+	free((void *) psrc);
+
 	return (dst - start);
 }
 
 int
-strvisx(dst, src, len, flag)
-	char *dst;
-	const char *src;
-	size_t len;
+strvisx(mbdst, mbsrc, mblen, flag)
+	char *mbdst;
+	const char *mbsrc;
+	size_t mblen;
 	int flag;
 {
-	int c;
-	char *start;
+	wchar_t *dst, *src;
+	wchar_t *pdst, *psrc;
+	wchar_t c;
+	wchar_t *start;
+	size_t len;
+
+	if ((psrc = (wchar_t *) calloc((strlen(mbsrc) + 1),
+	    sizeof(wchar_t))) == NULL)
+		return -1;
+	if ((pdst = (wchar_t *) calloc(((4 * strlen(mbsrc)) + 1),
+	    sizeof(wchar_t))) == NULL) {
+		free((void *) psrc);
+		return -1;
+	}
+
+	dst = pdst;
+	src = psrc;
 
-	for (start = dst; len > 1; len--) {
+	len = mbstowcs(src, mbsrc, strlen(mbsrc) + 1);
+
+	if (len < mblen)
+		mblen = len;
+
+	for (start = dst; mblen > 1; mblen--) {
 		c = *src;
-		dst = vis(dst, c, flag, *++src);
+		dst = _vis(dst, c, flag, *++src);
 	}
-	if (len)
-		dst = vis(dst, *src, flag, '\0');
-	*dst = '\0';
+	if (mblen)
+		dst = _vis(dst, *src, flag, L'\0');
+
+	wcstombs(mbdst, start, dst - start + sizeof(wchar_t));
+
+	free((void *) pdst);
+	free((void *) psrc);
 
 	return (dst - start);
 }
--- lib/libc/gen/vis.3.orig	2013-01-02 19:26:40.000000000 -0500
+++ lib/libc/gen/vis.3	2013-01-17 14:28:02.000000000 -0500
@@ -300,9 +300,6 @@
 .Sh HISTORY
 These functions first appeared in
 .Bx 4.4 .
-.Sh BUGS
-The
-.Nm
-family of functions do not recognize multibyte characters, and thus
-may consider them to be non-printable when they are in fact printable
-(and vice versa.)
+.Pp
+The functions were augmented to add multibyte character support in
+.Fx 9.1 .
--- usr.bin/vis/vis.c.orig	2013-01-02 19:15:19.000000000 -0500
+++ usr.bin/vis/vis.c	2013-01-16 20:21:54.000000000 -0500
@@ -45,6 +45,7 @@
 #include <locale.h>
 #include <stdio.h>
 #include <stdlib.h>
+#include <wchar.h>
 #include <unistd.h>
 #include <vis.h>
 
@@ -139,12 +140,12 @@
 	static int col = 0;
 	static char dummy[] = "\0";
 	char *cp = dummy+1; /* so *(cp-1) starts out != '\n' */
-	int c, rachar;
+	wint_t c, rachar;
 	char buff[5];
 
-	c = getc(fp);
+	c = getwc(fp);
 	while (c != EOF) {
-		rachar = getc(fp);
+		rachar = getwc(fp);
 		if (none) {
 			cp = buff;
 			*cp++ = c;
@@ -159,7 +160,7 @@
 			*cp++ = '\n';
 			*cp = '\0';
 		} else
-			(void) vis(buff, (char)c, eflags, (char)rachar);
+			(void) vis(buff, c, eflags, rachar);
 
 		cp = buff;
 		if (fold) {
--- usr.bin/vis/vis.1.orig	2013-01-02 19:15:19.000000000 -0500
+++ usr.bin/vis/vis.1	2013-01-17 14:34:16.000000000 -0500
@@ -128,11 +128,11 @@
 .Nm
 command appeared in
 .Bx 4.4 .
-.Sh BUGS
-Due to limitations in the underlying
+.Pp
+The underlying
 .Xr vis 3
-function, the
+function was augmented to add multibyte character support in
+.Fx 9.1
+at which point the
 .Nm
-utility
-does not recognize multibyte characters, and thus may consider them to be
-non-printable when they are in fact printable (and vice versa).
+utility was also updated to be multibyte character aware.
>Release-Note:
>Audit-Trail:

From: Brooks Davis <brooks@FreeBSD.org>
To: "J.R. Oldroyd" <fbsd@opal.com>
Cc: FreeBSD-gnats-submit@FreeBSD.org
Subject: Re: bin/175418: update vis(3) and vis(1) to support multibyte
 characters
Date: Fri, 18 Jan 2013 16:40:16 -0600

 On Fri, Jan 18, 2013 at 03:12:30PM -0500, J.R. Oldroyd wrote:
 > The vis(3) library calls and the vis(1) program do not support multibyte
 > character sets.  As a result many printable characters are not displayed
 > properly and vice-versa.  This patch enhances vis(3) to support multibyte
 > characters according to the setting of LC_CTYPE and also adjusts vis(1)
 > so that it reads input in multibyte aware manner.
 
 Thank you for your submission.  In a case of lousy timing, I merged the
 replaceming of our vis(3) implementation with NetBSD's to stable/9 four
 days ago.  Any changes now need to go through NetBSD.  The good news is
 that we share a common heritage so much of your patch may still apply (I
 haven't tried).
 
 -- Brooks
 
 > 
 > Since vis(3) is also used by ps(1), this patch fixes ps(1) so that wide
 > characters in command arguments are displayed properly.
 > >How-To-Repeat:
 > n/a
 > >Fix:
 > --- lib/libc/gen/vis.c.orig	2013-01-02 19:26:41.000000000 -0500
 > +++ lib/libc/gen/vis.c	2013-01-17 14:45:55.000000000 -0500
 > @@ -35,167 +35,233 @@
 >  
 >  #include <sys/types.h>
 >  #include <limits.h>
 > +#include <stdlib.h>
 > +#include <wchar.h>
 > +#include <wctype.h>
 > +#include <string.h>
 >  #include <ctype.h>
 >  #include <stdio.h>
 >  #include <vis.h>
 >  
 > -#define	isoctal(c)	(((u_char)(c)) >= '0' && ((u_char)(c)) <= '7')
 > +#define	iswoctal(c)	(((u_char)(c)) >= L'0' && ((u_char)(c)) <= L'7')
 >  
 >  /*
 > - * vis - visually encode characters
 > + * _vis - visually encode wide characters
 >   */
 > -char *
 > -vis(dst, c, flag, nextc)
 > -	char *dst;
 > -	int c, nextc;
 > +wchar_t *
 > +_vis(dst, c, flag, nextc)
 > +	wchar_t *dst;
 > +	wint_t c, nextc;
 >  	int flag;
 >  {
 > -	c = (unsigned char)c;
 > -
 >  	if (flag & VIS_HTTPSTYLE) {
 >  		/* Described in RFC 1808 */
 > -		if (!(isalnum(c) /* alpha-numeric */
 > +		if (!(iswalnum(c) /* alpha-numeric */
 >  		    /* safe */
 > -		    || c == '$' || c == '-' || c == '_' || c == '.' || c == '+'
 > +		    || c == L'$' || c == L'-' || c == L'_' || c == L'.' || c == L'+'
 >  		    /* extra */
 > -		    || c == '!' || c == '*' || c == '\'' || c == '('
 > -		    || c == ')' || c == ',')) {
 > -			*dst++ = '%';
 > -			snprintf(dst, 4, (c < 16 ? "0%X" : "%X"), c);
 > +		    || c == L'!' || c == L'*' || c == L'\'' || c == L'('
 > +		    || c == L')' || c == L',')) {
 > +			*dst++ = L'%';
 > +			swprintf(dst, 4, (c < 16 ? L"0%X" : L"%X"), c);
 >  			dst += 2;
 >  			goto done;
 >  		}
 >  	}
 >  
 >  	if ((flag & VIS_GLOB) &&
 > -	    (c == '*' || c == '?' || c == '[' || c == '#'))
 > +	    (c == L'*' || c == L'?' || c == L'[' || c == L'#'))
 >  		;
 > -	else if (isgraph(c) ||
 > -	   ((flag & VIS_SP) == 0 && c == ' ') ||
 > -	   ((flag & VIS_TAB) == 0 && c == '\t') ||
 > -	   ((flag & VIS_NL) == 0 && c == '\n') ||
 > -	   ((flag & VIS_SAFE) && (c == '\b' || c == '\007' || c == '\r'))) {
 > +	else if (iswgraph(c) ||
 > +	   ((flag & VIS_SP) == 0 && c == L' ') ||
 > +	   ((flag & VIS_TAB) == 0 && c == L'\t') ||
 > +	   ((flag & VIS_NL) == 0 && c == L'\n') ||
 > +	   ((flag & VIS_SAFE) && (c == L'\b' || c == L'\007' || c == L'\r'))) {
 >  		*dst++ = c;
 > -		if (c == '\\' && (flag & VIS_NOSLASH) == 0)
 > -			*dst++ = '\\';
 > -		*dst = '\0';
 > -		return (dst);
 > +		if (c == L'\\' && (flag & VIS_NOSLASH) == 0)
 > +			*dst++ = L'\\';
 > +		goto done;
 >  	}
 >  
 >  	if (flag & VIS_CSTYLE) {
 >  		switch(c) {
 > -		case '\n':
 > -			*dst++ = '\\';
 > -			*dst++ = 'n';
 > -			goto done;
 > -		case '\r':
 > -			*dst++ = '\\';
 > -			*dst++ = 'r';
 > -			goto done;
 > -		case '\b':
 > -			*dst++ = '\\';
 > -			*dst++ = 'b';
 > -			goto done;
 > -		case '\a':
 > -			*dst++ = '\\';
 > -			*dst++ = 'a';
 > -			goto done;
 > -		case '\v':
 > -			*dst++ = '\\';
 > -			*dst++ = 'v';
 > -			goto done;
 > -		case '\t':
 > -			*dst++ = '\\';
 > -			*dst++ = 't';
 > -			goto done;
 > -		case '\f':
 > -			*dst++ = '\\';
 > -			*dst++ = 'f';
 > -			goto done;
 > -		case ' ':
 > -			*dst++ = '\\';
 > -			*dst++ = 's';
 > -			goto done;
 > -		case '\0':
 > -			*dst++ = '\\';
 > -			*dst++ = '0';
 > -			if (isoctal(nextc)) {
 > -				*dst++ = '0';
 > -				*dst++ = '0';
 > +		case L'\n':
 > +			*dst++ = L'\\';
 > +			*dst++ = L'n';
 > +			goto done;
 > +		case L'\r':
 > +			*dst++ = L'\\';
 > +			*dst++ = L'r';
 > +			goto done;
 > +		case L'\b':
 > +			*dst++ = L'\\';
 > +			*dst++ = L'b';
 > +			goto done;
 > +		case L'\a':
 > +			*dst++ = L'\\';
 > +			*dst++ = L'a';
 > +			goto done;
 > +		case L'\v':
 > +			*dst++ = L'\\';
 > +			*dst++ = L'v';
 > +			goto done;
 > +		case L'\t':
 > +			*dst++ = L'\\';
 > +			*dst++ = L't';
 > +			goto done;
 > +		case L'\f':
 > +			*dst++ = L'\\';
 > +			*dst++ = L'f';
 > +			goto done;
 > +		case L' ':
 > +			*dst++ = L'\\';
 > +			*dst++ = L's';
 > +			goto done;
 > +		case L'\0':
 > +			*dst++ = L'\\';
 > +			*dst++ = L'0';
 > +			if (iswoctal(nextc)) {
 > +				*dst++ = L'0';
 > +				*dst++ = L'0';
 >  			}
 >  			goto done;
 >  		}
 >  	}
 > -	if (((c & 0177) == ' ') || isgraph(c) || (flag & VIS_OCTAL)) {
 > -		*dst++ = '\\';
 > -		*dst++ = ((u_char)c >> 6 & 07) + '0';
 > -		*dst++ = ((u_char)c >> 3 & 07) + '0';
 > -		*dst++ = ((u_char)c & 07) + '0';
 > +	if (((c & 0177) == L' ') || (flag & VIS_OCTAL)) {
 > +		*dst++ = L'\\';
 > +		*dst++ = ((u_char)c >> 6 & 07) + L'0';
 > +		*dst++ = ((u_char)c >> 3 & 07) + L'0';
 > +		*dst++ = ((u_char)c & 07) + L'0';
 >  		goto done;
 >  	}
 >  	if ((flag & VIS_NOSLASH) == 0)
 > -		*dst++ = '\\';
 > +		*dst++ = L'\\';
 >  	if (c & 0200) {
 >  		c &= 0177;
 > -		*dst++ = 'M';
 > +		*dst++ = L'M';
 >  	}
 > -	if (iscntrl(c)) {
 > -		*dst++ = '^';
 > +	if (iswcntrl(c)) {
 > +		*dst++ = L'^';
 >  		if (c == 0177)
 > -			*dst++ = '?';
 > +			*dst++ = L'?';
 >  		else
 > -			*dst++ = c + '@';
 > +			*dst++ = c + L'@';
 >  	} else {
 > -		*dst++ = '-';
 > +		*dst++ = L'-';
 >  		*dst++ = c;
 >  	}
 >  done:
 > -	*dst = '\0';
 > +	*dst = L'\0';
 >  	return (dst);
 >  }
 >  
 >  /*
 > + * vis - visually encode characters
 > + */
 > +char *
 > +vis(dst, c, flag, nextc)
 > +	char *dst;
 > +	int c, nextc;
 > +	int flag;
 > +{
 > +	/*
 > +	 * Output may be up to 4 times the size of input plus
 > +	 * 1 for the NUL.
 > +	 */
 > +	wchar_t res[5];
 > +
 > +	_vis(res, (wint_t) c, flag, (wint_t) nextc);
 > +	wcstombs(dst, res, wcslen(res)+sizeof(wchar_t));
 > +	return (dst + strlen(dst));
 > +}
 > +
 > +/*
 >   * strvis, strvisx - visually encode characters from src into dst
 >   *
 >   *	Dst must be 4 times the size of src to account for possible
 >   *	expansion.  The length of dst, not including the trailing NUL,
 >   *	is returned.
 >   *
 > - *	Strvisx encodes exactly len bytes from src into dst.
 > + *	Strvisx encodes exactly len characters from src into dst.
 >   *	This is useful for encoding a block of data.
 >   */
 >  int
 > -strvis(dst, src, flag)
 > -	char *dst;
 > -	const char *src;
 > +strvis(mbdst, mbsrc, flag)
 > +	char *mbdst;
 > +	const char *mbsrc;
 >  	int flag;
 >  {
 > -	char c;
 > -	char *start;
 > +	wchar_t *dst, *src;
 > +	wchar_t *pdst, *psrc;
 > +	wchar_t c;
 > +	wchar_t *start;
 > +
 > +	if ((psrc = (wchar_t *) calloc((strlen(mbsrc) + 1),
 > +	    sizeof(wchar_t))) == NULL)
 > +		return -1;
 > +	if ((pdst = (wchar_t *) calloc(((4 * strlen(mbsrc)) + 1),
 > +	    sizeof(wchar_t))) == NULL) {
 > +		free((void *) psrc);
 > +		return -1;
 > +	}
 > +
 > +	dst = pdst;
 > +	src = psrc;
 > +
 > +	mbstowcs(src, mbsrc, strlen(mbsrc) + 1);
 >  
 >  	for (start = dst; (c = *src); )
 > -		dst = vis(dst, c, flag, *++src);
 > -	*dst = '\0';
 > +		dst = _vis(dst, c, flag, *++src);
 > +
 > +	wcstombs(mbdst, start, dst - start + sizeof(wchar_t));
 > +
 > +	free((void *) pdst);
 > +	free((void *) psrc);
 > +
 >  	return (dst - start);
 >  }
 >  
 >  int
 > -strvisx(dst, src, len, flag)
 > -	char *dst;
 > -	const char *src;
 > -	size_t len;
 > +strvisx(mbdst, mbsrc, mblen, flag)
 > +	char *mbdst;
 > +	const char *mbsrc;
 > +	size_t mblen;
 >  	int flag;
 >  {
 > -	int c;
 > -	char *start;
 > +	wchar_t *dst, *src;
 > +	wchar_t *pdst, *psrc;
 > +	wchar_t c;
 > +	wchar_t *start;
 > +	size_t len;
 > +
 > +	if ((psrc = (wchar_t *) calloc((strlen(mbsrc) + 1),
 > +	    sizeof(wchar_t))) == NULL)
 > +		return -1;
 > +	if ((pdst = (wchar_t *) calloc(((4 * strlen(mbsrc)) + 1),
 > +	    sizeof(wchar_t))) == NULL) {
 > +		free((void *) psrc);
 > +		return -1;
 > +	}
 > +
 > +	dst = pdst;
 > +	src = psrc;
 >  
 > -	for (start = dst; len > 1; len--) {
 > +	len = mbstowcs(src, mbsrc, strlen(mbsrc) + 1);
 > +
 > +	if (len < mblen)
 > +		mblen = len;
 > +
 > +	for (start = dst; mblen > 1; mblen--) {
 >  		c = *src;
 > -		dst = vis(dst, c, flag, *++src);
 > +		dst = _vis(dst, c, flag, *++src);
 >  	}
 > -	if (len)
 > -		dst = vis(dst, *src, flag, '\0');
 > -	*dst = '\0';
 > +	if (mblen)
 > +		dst = _vis(dst, *src, flag, L'\0');
 > +
 > +	wcstombs(mbdst, start, dst - start + sizeof(wchar_t));
 > +
 > +	free((void *) pdst);
 > +	free((void *) psrc);
 >  
 >  	return (dst - start);
 >  }
 > --- lib/libc/gen/vis.3.orig	2013-01-02 19:26:40.000000000 -0500
 > +++ lib/libc/gen/vis.3	2013-01-17 14:28:02.000000000 -0500
 > @@ -300,9 +300,6 @@
 >  .Sh HISTORY
 >  These functions first appeared in
 >  .Bx 4.4 .
 > -.Sh BUGS
 > -The
 > -.Nm
 > -family of functions do not recognize multibyte characters, and thus
 > -may consider them to be non-printable when they are in fact printable
 > -(and vice versa.)
 > +.Pp
 > +The functions were augmented to add multibyte character support in
 > +.Fx 9.1 .
 > --- usr.bin/vis/vis.c.orig	2013-01-02 19:15:19.000000000 -0500
 > +++ usr.bin/vis/vis.c	2013-01-16 20:21:54.000000000 -0500
 > @@ -45,6 +45,7 @@
 >  #include <locale.h>
 >  #include <stdio.h>
 >  #include <stdlib.h>
 > +#include <wchar.h>
 >  #include <unistd.h>
 >  #include <vis.h>
 >  
 > @@ -139,12 +140,12 @@
 >  	static int col = 0;
 >  	static char dummy[] = "\0";
 >  	char *cp = dummy+1; /* so *(cp-1) starts out != '\n' */
 > -	int c, rachar;
 > +	wint_t c, rachar;
 >  	char buff[5];
 >  
 > -	c = getc(fp);
 > +	c = getwc(fp);
 >  	while (c != EOF) {
 > -		rachar = getc(fp);
 > +		rachar = getwc(fp);
 >  		if (none) {
 >  			cp = buff;
 >  			*cp++ = c;
 > @@ -159,7 +160,7 @@
 >  			*cp++ = '\n';
 >  			*cp = '\0';
 >  		} else
 > -			(void) vis(buff, (char)c, eflags, (char)rachar);
 > +			(void) vis(buff, c, eflags, rachar);
 >  
 >  		cp = buff;
 >  		if (fold) {
 > --- usr.bin/vis/vis.1.orig	2013-01-02 19:15:19.000000000 -0500
 > +++ usr.bin/vis/vis.1	2013-01-17 14:34:16.000000000 -0500
 > @@ -128,11 +128,11 @@
 >  .Nm
 >  command appeared in
 >  .Bx 4.4 .
 > -.Sh BUGS
 > -Due to limitations in the underlying
 > +.Pp
 > +The underlying
 >  .Xr vis 3
 > -function, the
 > +function was augmented to add multibyte character support in
 > +.Fx 9.1
 > +at which point the
 >  .Nm
 > -utility
 > -does not recognize multibyte characters, and thus may consider them to be
 > -non-printable when they are in fact printable (and vice versa).
 > +utility was also updated to be multibyte character aware.
 > >Release-Note:
 > >Audit-Trail:
 > >Unformatted:
 > _______________________________________________
 > freebsd-bugs@freebsd.org mailing list
 > http://lists.freebsd.org/mailman/listinfo/freebsd-bugs
 > To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org"
 > 
Responsible-Changed-From-To: freebsd-bugs->brooks 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sun Jan 20 01:41:37 UTC 2013 
Responsible-Changed-Why:  
Over to maintainer. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=175418 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: bin/175418: commit references a PR
Date: Thu, 14 Mar 2013 23:52:04 +0000 (UTC)

 Author: brooks
 Date: Thu Mar 14 23:51:47 2013
 New Revision: 248302
 URL: http://svnweb.freebsd.org/changeset/base/248302
 
 Log:
   Update to the latest (un)vis(3) sources from NetBSD.  This adds
   multibyte support[0] and the new functions strenvisx and strsenvisx.
   
   Add MLINKS for vis(3) functions add by this and the initial import from
   NetBSD[1].
   
   PR:		bin/166364, bin/175418
   Submitted by:	"J.R. Oldroyd" <fbsd@opal.com>[0]
   		stefanf[1]
   Obtained from:	NetBSD
   MFC after:	2 weeks
 
 Modified:
   head/contrib/libc-vis/unvis.3
   head/contrib/libc-vis/unvis.c
   head/contrib/libc-vis/vis.3
   head/contrib/libc-vis/vis.c
   head/contrib/libc-vis/vis.h
   head/lib/libc/gen/Makefile.inc
   head/lib/libc/gen/Symbol.map
 Directory Properties:
   head/contrib/libc-vis/   (props changed)
 
 Modified: head/contrib/libc-vis/unvis.3
 ==============================================================================
 --- head/contrib/libc-vis/unvis.3	Thu Mar 14 23:35:52 2013	(r248301)
 +++ head/contrib/libc-vis/unvis.3	Thu Mar 14 23:51:47 2013	(r248302)
 @@ -1,4 +1,4 @@
 -.\"	$NetBSD: unvis.3,v 1.23 2011/03/17 14:06:29 wiz Exp $
 +.\"	$NetBSD: unvis.3,v 1.27 2012/12/15 07:34:36 wiz Exp $
  .\"	$FreeBSD$
  .\"
  .\" Copyright (c) 1989, 1991, 1993
 @@ -126,15 +126,17 @@ The
  function has several return codes that must be handled properly.
  They are:
  .Bl -tag -width UNVIS_VALIDPUSH
 -.It Li \&0 (zero)
 +.It Li \&0 No (zero)
  Another character is necessary; nothing has been recognized yet.
  .It Dv UNVIS_VALID
  A valid character has been recognized and is available at the location
 -pointed to by cp.
 +pointed to by
 +.Fa cp .
  .It Dv UNVIS_VALIDPUSH
  A valid character has been recognized and is available at the location
 -pointed to by cp; however, the character currently passed in should
 -be passed in again.
 +pointed to by
 +.Fa cp ;
 +however, the character currently passed in should be passed in again.
  .It Dv UNVIS_NOCHAR
  A valid sequence was detected, but no character was produced.
  This return code is necessary to indicate a logical break between characters.
 @@ -150,7 +152,7 @@ one more time with flag set to
  to extract any remaining character (the character passed in is ignored).
  .Pp
  The
 -.Ar flag
 +.Fa flag
  argument is also used to specify the encoding style of the source.
  If set to
  .Dv VIS_HTTPSTYLE
 @@ -161,7 +163,8 @@ will decode URI strings as specified in 
  If set to
  .Dv VIS_HTTP1866 ,
  .Fn unvis
 -will decode URI strings as specified in RFC 1866.
 +will decode entity references and numeric character references
 +as specified in RFC 1866.
  If set to
  .Dv VIS_MIMESTYLE ,
  .Fn unvis
 @@ -169,7 +172,9 @@ will decode MIME Quoted-Printable string
  If set to
  .Dv VIS_NOESCAPE ,
  .Fn unvis
 -will not decode \e quoted characters.
 +will not decode
 +.Ql \e
 +quoted characters.
  .Pp
  The following code fragment illustrates a proper use of
  .Fn unvis .
 @@ -204,7 +209,7 @@ The functions
  and
  .Fn strnunvisx
  will return \-1 on error and set
 -.Va errno 
 +.Va errno
  to:
  .Bl -tag -width Er
  .It Bq Er EINVAL
 @@ -212,7 +217,7 @@ An invalid escape sequence was detected,
  .El
  .Pp
  In addition the functions
 -.Fn strnunvis 
 +.Fn strnunvis
  and
  .Fn strnunvisx
  will can also set
 @@ -244,4 +249,14 @@ and
  functions appeared in
  .Nx 6.0
  and
 -.Fx 10.0 .
 +.Fx 9.2 .
 +.Sh BUGS
 +The names
 +.Dv VIS_HTTP1808
 +and
 +.Dv VIS_HTTP1866
 +are wrong.
 +Percent-encoding was defined in RFC 1738, the original RFC for URL.
 +RFC 1866 defines HTML 2.0, an application of SGML, from which it
 +inherits concepts of numeric character references and entity
 +references.
 
 Modified: head/contrib/libc-vis/unvis.c
 ==============================================================================
 --- head/contrib/libc-vis/unvis.c	Thu Mar 14 23:35:52 2013	(r248301)
 +++ head/contrib/libc-vis/unvis.c	Thu Mar 14 23:51:47 2013	(r248302)
 @@ -1,4 +1,4 @@
 -/*	$NetBSD: unvis.c,v 1.40 2012/12/14 21:31:01 christos Exp $	*/
 +/*	$NetBSD: unvis.c,v 1.41 2012/12/15 04:29:53 matt Exp $	*/
  
  /*-
   * Copyright (c) 1989, 1993
 @@ -34,7 +34,7 @@
  #if 0
  static char sccsid[] = "@(#)unvis.c	8.1 (Berkeley) 6/4/93";
  #else
 -__RCSID("$NetBSD: unvis.c,v 1.40 2012/12/14 21:31:01 christos Exp $");
 +__RCSID("$NetBSD: unvis.c,v 1.41 2012/12/15 04:29:53 matt Exp $");
  #endif
  #endif /* LIBC_SCCS and not lint */
  __FBSDID("$FreeBSD$");
 @@ -90,7 +90,7 @@ __weak_alias(strnunvisx,_strnunvisx)
   * RFC 1866
   */
  static const struct nv {
 -	const char name[7];
 +	char name[7];
  	uint8_t value;
  } nv[] = {
  	{ "AElig",	198 }, /* capital AE diphthong (ligature)  */
 
 Modified: head/contrib/libc-vis/vis.3
 ==============================================================================
 --- head/contrib/libc-vis/vis.3	Thu Mar 14 23:35:52 2013	(r248301)
 +++ head/contrib/libc-vis/vis.3	Thu Mar 14 23:51:47 2013	(r248302)
 @@ -1,4 +1,4 @@
 -.\"	$NetBSD: vis.3,v 1.29 2012/12/14 22:55:59 christos Exp $
 +.\"	$NetBSD: vis.3,v 1.39 2013/02/20 20:05:26 christos Exp $
  .\"	$FreeBSD$
  .\"
  .\" Copyright (c) 1989, 1991, 1993
 @@ -30,7 +30,7 @@
  .\"
  .\"     @(#)vis.3	8.1 (Berkeley) 6/9/93
  .\"
 -.Dd December 14, 2012
 +.Dd February 19, 2013
  .Dt VIS 3
  .Os
  .Sh NAME
 @@ -40,12 +40,14 @@
  .Nm strnvis ,
  .Nm strvisx ,
  .Nm strnvisx ,
 +.Nm strenvisx ,
  .Nm svis ,
  .Nm snvis ,
  .Nm strsvis ,
  .Nm strsnvis ,
 -.Nm strsvisx
 -.Nm strsnvisx
 +.Nm strsvisx ,
 +.Nm strsnvisx ,
 +.Nm strsenvisx
  .Nd visually encode characters
  .Sh LIBRARY
  .Lb libc
 @@ -63,6 +65,8 @@
  .Fn strvisx "char *dst" "const char *src" "size_t len" "int flag"
  .Ft int
  .Fn strnvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag"
 +.Ft int
 +.Fn strenvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag" "int *cerr_ptr"
  .Ft char *
  .Fn svis "char *dst" "int c" "int flag" "int nextc" "const char *extra"
  .Ft char *
 @@ -75,6 +79,8 @@
  .Fn strsvisx "char *dst" "const char *src" "size_t len" "int flag" "const char *extra"
  .Ft int
  .Fn strsnvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag" "const char *extra"
 +.Ft int
 +.Fn strsenvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag" "const char *extra" "int *cerr_ptr"
  .Sh DESCRIPTION
  The
  .Fn vis
 @@ -89,11 +95,11 @@ needs no encoding, it is copied in unalt
  The string is null terminated, and a pointer to the end of the string is
  returned.
  The maximum length of any encoding is four
 -characters (not including the trailing
 +bytes (not including the trailing
  .Dv NUL ) ;
  thus, when
  encoding a set of characters into a buffer, the size of the buffer should
 -be four times the number of characters encoded, plus one for the trailing
 +be four times the number of bytes encoded, plus one for the trailing
  .Dv NUL .
  The flag parameter is used for altering the default range of
  characters considered for encoding and for altering the visual
 @@ -142,16 +148,17 @@ terminate
  The size of
  .Fa dst
  must be four times the number
 -of characters encoded from
 +of bytes encoded from
  .Fa src
  (plus one for the
  .Dv NUL ) .
  Both
 -forms return the number of characters in dst (not including
 -the trailing
 +forms return the number of characters in
 +.Fa dst
 +(not including the trailing
  .Dv NUL ) .
  The
 -.Dq n
 +.Dq Nm n
  versions of the functions also take an additional argument
  .Fa dlen
  that indicates the length of the
 @@ -159,7 +166,7 @@ that indicates the length of the
  buffer.
  If
  .Fa dlen
 -is not large enough to fix the converted string then the
 +is not large enough to fit the converted string then the
  .Fn strnvis
  and
  .Fn strnvisx
 @@ -167,6 +174,14 @@ functions return \-1 and set
  .Va errno
  to
  .Dv ENOSPC .
 +The
 +.Fn strenvisx
 +function takes an additional argument,
 +.Fa cerr_ptr ,
 +that is used to pass in and out a multibyte conversion error flag.
 +This is useful when processing single characters at a time when
 +it is possible that the locale may be set to something other
 +than the locale of the characters in the input data.
  .Pp
  The functions
  .Fn svis ,
 @@ -174,16 +189,18 @@ The functions
  .Fn strsvis ,
  .Fn strsnvis ,
  .Fn strsvisx ,
 +.Fn strsnvisx ,
  and
 -.Fn strsnvisx
 +.Fn strsenvisx
  correspond to
  .Fn vis ,
  .Fn nvis ,
  .Fn strvis ,
  .Fn strnvis ,
  .Fn strvisx ,
 +.Fn strnvisx ,
  and
 -.Fn strnvisx
 +.Fn strenvisx
  but have an additional argument
  .Fa extra ,
  pointing to a
 @@ -214,14 +231,13 @@ and
  .Fn strnvisx ) ,
  and the type of representation used.
  By default, all non-graphic characters,
 -except space, tab, and newline are encoded.
 -(See
 -.Xr isgraph 3 . )
 +except space, tab, and newline are encoded (see
 +.Xr isgraph 3 ) .
  The following flags
  alter this:
  .Bl -tag -width VIS_WHITEX
  .It Dv VIS_GLOB
 -Also encode magic characters
 +Also encode the magic characters
  .Ql ( * ,
  .Ql \&? ,
  .Ql \&[
 @@ -243,11 +259,13 @@ Synonym for
  \&|
  .Dv VIS_NL .
  .It Dv VIS_SAFE
 -Only encode "unsafe" characters.
 +Only encode
 +.Dq unsafe
 +characters.
  Unsafe means control characters which may cause common terminals to perform
  unexpected functions.
  Currently this form allows space, tab, newline, backspace, bell, and
 -return - in addition to all graphic characters - unencoded.
 +return \(em in addition to all graphic characters \(em unencoded.
  .El
  .Pp
  (The above flags have no effect for
 @@ -287,8 +305,8 @@ Use an
  to represent meta characters (characters with the 8th
  bit set), and use caret
  .Ql ^
 -to represent control characters see
 -.Pf ( Xr iscntrl 3 ) .
 +to represent control characters (see
 +.Xr iscntrl 3 ) .
  The following formats are used:
  .Bl -tag -width xxxxx
  .It Dv \e^C
 @@ -335,19 +353,20 @@ Use C-style backslash sequences to repre
  characters.
  The following sequences are used to represent the indicated characters:
  .Bd -unfilled -offset indent
 -.Li \ea Tn  - BEL No (007)
 -.Li \eb Tn  - BS No (010)
 -.Li \ef Tn  - NP No (014)
 -.Li \en Tn  - NL No (012)
 -.Li \er Tn  - CR No (015)
 -.Li \es Tn  - SP No (040)
 -.Li \et Tn  - HT No (011)
 -.Li \ev Tn  - VT No (013)
 -.Li \e0 Tn  - NUL No (000)
 +.Li \ea Tn  \(em BEL No (007)
 +.Li \eb Tn  \(em BS No (010)
 +.Li \ef Tn  \(em NP No (014)
 +.Li \en Tn  \(em NL No (012)
 +.Li \er Tn  \(em CR No (015)
 +.Li \es Tn  \(em SP No (040)
 +.Li \et Tn  \(em HT No (011)
 +.Li \ev Tn  \(em VT No (013)
 +.Li \e0 Tn  \(em NUL No (000)
  .Ed
  .Pp
 -When using this format, the nextc parameter is looked at to determine
 -if a
 +When using this format, the
 +.Fa nextc
 +parameter is looked at to determine if a
  .Dv NUL
  character can be encoded as
  .Ql \e0
 @@ -374,8 +393,8 @@ represents a lower case hexadecimal digi
  .It Dv VIS_MIMESTYLE
  Use MIME Quoted-Printable encoding as described in RFC 2045, only don't
  break lines and don't handle CRLF.
 -The form is:
 -.Ql %XX
 +The form is
 +.Ql =XX
  where
  .Em X
  represents an upper case hexadecimal digit.
 @@ -392,6 +411,41 @@ meta characters as
  .Ql M-C ) .
  With this flag set, the encoding is
  ambiguous and non-invertible.
 +.Sh MULTIBYTE CHARACTER SUPPORT
 +These functions support multibyte character input.
 +The encoding conversion is influenced by the setting of the
 +.Ev LC_CTYPE
 +environment variable which defines the set of characters
 +that can be copied without encoding.
 +.Pp
 +When 8-bit data is present in the input,
 +.Ev LC_CTYPE
 +must be set to the correct locale or to the C locale.
 +If the locales of the data and the conversion are mismatched,
 +multibyte character recognition may fail and encoding will be performed
 +byte-by-byte instead.
 +.Pp
 +As noted above,
 +.Fa dst
 +must be four times the number of bytes processed from
 +.Fa src .
 +But note that each multibyte character can be up to
 +.Dv MB_LEN_MAX
 +bytes
 +.\" (see
 +.\" .Xr multibyte 3 )
 +so in terms of multibyte characters,
 +.Fa dst
 +must be four times
 +.Dv MB_LEN_MAX
 +times the number of characters processed from
 +.Fa src .
 +.Sh ENVIRONMENT
 +.Bl -tag -width ".Ev LC_CTYPE"
 +.It Ev LC_CTYPE
 +Specify the locale of the input data.
 +Set to C if the input data locale is unknown.
 +.El
  .Sh ERRORS
  The functions
  .Fn nvis
 @@ -407,11 +461,11 @@ and
  .Fn strsnvisx ,
  will return \-1 when the
  .Fa dlen
 -destination buffer length size is not enough to perform the conversion while
 +destination buffer size is not enough to perform the conversion while
  setting
  .Va errno
  to:
 -.Bl -tag -width Er
 +.Bl -tag -width ".Bq Er ENOSPC"
  .It Bq Er ENOSPC
  The destination buffer size is not large enough to perform the conversion.
  .El
 @@ -419,18 +473,23 @@ The destination buffer size is not large
  .Xr unvis 1 ,
  .Xr vis 1 ,
  .Xr glob 3 ,
 +.\" .Xr multibyte 3 ,
  .Xr unvis 3
  .Rs
  .%A T. Berners-Lee
  .%T Uniform Resource Locators (URL)
 -.%O RFC1738
 +.%O "RFC 1738"
 +.Re
 +.Rs
 +.%T "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies"
 +.%O "RFC 2045"
  .Re
  .Sh HISTORY
  The
  .Fn vis ,
  .Fn strvis ,
  and
 -.Fa strvisx
 +.Fn strvisx
  functions first appeared in
  .Bx 4.4 .
  The
 @@ -441,7 +500,7 @@ and
  functions appeared in
  .Nx 1.5
  and
 -.Fx 10.0 .
 +.Fx 9.2 .
  The buffer size limited versions of the functions
  .Po Fn nvis ,
  .Fn strnvis ,
 @@ -451,6 +510,9 @@ The buffer size limited versions of the 
  and
  .Fn strsnvisx Pc
  appeared in
 -.Nx 6.0
  and
 -.Fx 10.0 .
 +.Fx 9.2 .
 +Myltibyte character support was added in
 +.Nx 7.0
 +and
 +.Fx 9.2 .
 
 Modified: head/contrib/libc-vis/vis.c
 ==============================================================================
 --- head/contrib/libc-vis/vis.c	Thu Mar 14 23:35:52 2013	(r248301)
 +++ head/contrib/libc-vis/vis.c	Thu Mar 14 23:51:47 2013	(r248302)
 @@ -1,4 +1,4 @@
 -/*	$NetBSD: vis.c,v 1.45 2012/12/14 21:38:18 christos Exp $	*/
 +/*	$NetBSD: vis.c,v 1.60 2013/02/21 16:21:20 joerg Exp $	*/
  
  /*-
   * Copyright (c) 1989, 1993
 @@ -57,19 +57,23 @@
  
  #include <sys/cdefs.h>
  #if defined(LIBC_SCCS) && !defined(lint)
 -__RCSID("$NetBSD: vis.c,v 1.45 2012/12/14 21:38:18 christos Exp $");
 +__RCSID("$NetBSD: vis.c,v 1.60 2013/02/21 16:21:20 joerg Exp $");
  #endif /* LIBC_SCCS and not lint */
 +#ifdef __FBSDID
  __FBSDID("$FreeBSD$");
 +#define	_DIAGASSERT(x)	assert(x)
 +#endif
  
  #include "namespace.h"
  #include <sys/types.h>
 +#include <sys/param.h>
  
  #include <assert.h>
  #include <vis.h>
  #include <errno.h>
  #include <stdlib.h>
 -
 -#define	_DIAGASSERT(x)	assert(x)
 +#include <wchar.h>
 +#include <wctype.h>
  
  #ifdef __weak_alias
  __weak_alias(strvisx,_strvisx)
 @@ -81,65 +85,66 @@ __weak_alias(strvisx,_strvisx)
  #include <stdio.h>
  #include <string.h>
  
 -static char *do_svis(char *, size_t *, int, int, int, const char *);
 +/*
 + * The reason for going through the trouble to deal with character encodings
 + * in vis(3), is that we use this to safe encode output of commands. This
 + * safe encoding varies depending on the character set. For example if we
 + * display ps output in French, we don't want to display French characters
 + * as M-foo.
 + */
 +
 +static wchar_t *do_svis(wchar_t *, wint_t, int, wint_t, const wchar_t *);
  
  #undef BELL
 -#define BELL '\a'
 +#define BELL L'\a'
 +
 +#define iswoctal(c)	(((u_char)(c)) >= L'0' && ((u_char)(c)) <= L'7')
 +#define iswwhite(c)	(c == L' ' || c == L'\t' || c == L'\n')
 +#define iswsafe(c)	(c == L'\b' || c == BELL || c == L'\r')
 +#define xtoa(c)		L"0123456789abcdef"[c]
 +#define XTOA(c)		L"0123456789ABCDEF"[c]
  
 -#define isoctal(c)	(((u_char)(c)) >= '0' && ((u_char)(c)) <= '7')
 -#define iswhite(c)	(c == ' ' || c == '\t' || c == '\n')
 -#define issafe(c)	(c == '\b' || c == BELL || c == '\r')
 -#define xtoa(c)		"0123456789abcdef"[c]
 -#define XTOA(c)		"0123456789ABCDEF"[c]
 -
 -#define MAXEXTRAS	9
 -
 -#define MAKEEXTRALIST(flag, extra, orig_str)				      \
 -do {									      \
 -	const char *orig = orig_str;					      \
 -	const char *o = orig;						      \
 -	char *e;							      \
 -	while (*o++)							      \
 -		continue;						      \
 -	extra = malloc((size_t)((o - orig) + MAXEXTRAS));		      \
 -	if (!extra) break;						      \
 -	for (o = orig, e = extra; (*e++ = *o++) != '\0';)		      \
 -		continue;						      \
 -	e--;								      \
 -	if (flag & VIS_GLOB) {						      \
 -		*e++ = '*';						      \
 -		*e++ = '?';						      \
 -		*e++ = '[';						      \
 -		*e++ = '#';						      \
 -	}								      \
 -	if (flag & VIS_SP) *e++ = ' ';					      \
 -	if (flag & VIS_TAB) *e++ = '\t';				      \
 -	if (flag & VIS_NL) *e++ = '\n';					      \
 -	if ((flag & VIS_NOSLASH) == 0) *e++ = '\\';			      \
 -	*e = '\0';							      \
 -} while (/*CONSTCOND*/0)
 +#define MAXEXTRAS	10
 +
 +#if !HAVE_NBTOOL_CONFIG_H
 +#ifndef __NetBSD__
 +/*
 + * On NetBSD MB_LEN_MAX is currently 32 which does not fit on any integer
 + * integral type and it is probably wrong, since currently the maximum
 + * number of bytes and character needs is 6. Until this is fixed, the
 + * loops below are using sizeof(uint64_t) - 1 instead of MB_LEN_MAX, and
 + * the assertion is commented out.
 + */
 +#ifdef __FreeBSD__
 +/*
 + * On FreeBSD including <sys/systm.h> for CTASSERT only works in kernel
 + * mode.
 + */
 +#ifndef CTASSERT
 +#define CTASSERT(x)             _CTASSERT(x, __LINE__)
 +#define _CTASSERT(x, y)         __CTASSERT(x, y)
 +#define __CTASSERT(x, y)        typedef char __assert ## y[(x) ? 1 : -1]
 +#endif
 +#endif /* __FreeBSD__ */
 +CTASSERT(MB_LEN_MAX <= sizeof(uint64_t));
 +#endif /* !__NetBSD__ */
 +#endif
  
  /*
   * This is do_hvis, for HTTP style (RFC 1808)
   */
 -static char *
 -do_hvis(char *dst, size_t *dlen, int c, int flag, int nextc, const char *extra)
 +static wchar_t *
 +do_hvis(wchar_t *dst, wint_t c, int flags, wint_t nextc, const wchar_t *extra)
  {
 -
 -	if ((isascii(c) && isalnum(c))
 +	if (iswalnum(c)
  	    /* safe */
 -	    || c == '$' || c == '-' || c == '_' || c == '.' || c == '+'
 +	    || c == L'$' || c == L'-' || c == L'_' || c == L'.' || c == L'+'
  	    /* extra */
 -	    || c == '!' || c == '*' || c == '\'' || c == '(' || c == ')'
 -	    || c == ',') {
 -		dst = do_svis(dst, dlen, c, flag, nextc, extra);
 -	} else {
 -		if (dlen) {
 -			if (*dlen < 3)
 -				return NULL;
 -			*dlen -= 3;
 -		}
 -		*dst++ = '%';
 +	    || c == L'!' || c == L'*' || c == L'\'' || c == L'(' || c == L')'
 +	    || c == L',')
 +		dst = do_svis(dst, c, flags, nextc, extra);
 +	else {
 +		*dst++ = L'%';
  		*dst++ = xtoa(((unsigned int)c >> 4) & 0xf);
  		*dst++ = xtoa((unsigned int)c & 0xf);
  	}
 @@ -151,312 +156,448 @@ do_hvis(char *dst, size_t *dlen, int c, 
   * This is do_mvis, for Quoted-Printable MIME (RFC 2045)
   * NB: No handling of long lines or CRLF.
   */
 -static char *
 -do_mvis(char *dst, size_t *dlen, int c, int flag, int nextc, const char *extra)
 +static wchar_t *
 +do_mvis(wchar_t *dst, wint_t c, int flags, wint_t nextc, const wchar_t *extra)
  {
 -	if ((c != '\n') &&
 +	if ((c != L'\n') &&
  	    /* Space at the end of the line */
 -	    ((isspace(c) && (nextc == '\r' || nextc == '\n')) ||
 +	    ((iswspace(c) && (nextc == L'\r' || nextc == L'\n')) ||
  	    /* Out of range */
 -	    (!isspace(c) && (c < 33 || (c > 60 && c < 62) || c > 126)) ||
 -	    /* Specific char to be escaped */ 
 -	    strchr("#$@[\\]^`{|}~", c) != NULL)) {
 -		if (dlen) {
 -			if (*dlen < 3)
 -				return NULL;
 -			*dlen -= 3;
 -		}
 -		*dst++ = '=';
 +	    (!iswspace(c) && (c < 33 || (c > 60 && c < 62) || c > 126)) ||
 +	    /* Specific char to be escaped */
 +	    wcschr(L"#$@[\\]^`{|}~", c) != NULL)) {
 +		*dst++ = L'=';
  		*dst++ = XTOA(((unsigned int)c >> 4) & 0xf);
  		*dst++ = XTOA((unsigned int)c & 0xf);
 -	} else {
 -		dst = do_svis(dst, dlen, c, flag, nextc, extra);
 -	}
 +	} else
 +		dst = do_svis(dst, c, flags, nextc, extra);
  	return dst;
  }
  
  /*
 - * This is do_vis, the central code of vis.
 - * dst:	      Pointer to the destination buffer
 - * c:	      Character to encode
 - * flag:      Flag word
 - * nextc:     The character following 'c'
 - * extra:     Pointer to the list of extra characters to be
 - *	      backslash-protected.
 + * Output single byte of multibyte character.
   */
 -static char *
 -do_svis(char *dst, size_t *dlen, int c, int flag, int nextc, const char *extra)
 +static wchar_t *
 +do_mbyte(wchar_t *dst, wint_t c, int flags, wint_t nextc, int iswextra)
  {
 -	int isextra;
 -	size_t odlen = dlen ? *dlen : 0;
 -
 -	isextra = strchr(extra, c) != NULL;
 -#define HAVE(x) \
 -	do { \
 -		if (dlen) { \
 -			if (*dlen < (x)) \
 -				goto out; \
 -			*dlen -= (x); \
 -		} \
 -	} while (/*CONSTCOND*/0)
 -	if (!isextra && isascii(c) && (isgraph(c) || iswhite(c) ||
 -	    ((flag & VIS_SAFE) && issafe(c)))) {
 -		HAVE(1);
 -		*dst++ = c;
 -		return dst;
 -	}
 -	if (flag & VIS_CSTYLE) {
 -		HAVE(2);
 +	if (flags & VIS_CSTYLE) {
  		switch (c) {
 -		case '\n':
 -			*dst++ = '\\'; *dst++ = 'n';
 +		case L'\n':
 +			*dst++ = L'\\'; *dst++ = L'n';
  			return dst;
 -		case '\r':
 -			*dst++ = '\\'; *dst++ = 'r';
 +		case L'\r':
 +			*dst++ = L'\\'; *dst++ = L'r';
  			return dst;
 -		case '\b':
 -			*dst++ = '\\'; *dst++ = 'b';
 +		case L'\b':
 +			*dst++ = L'\\'; *dst++ = L'b';
  			return dst;
  		case BELL:
 -			*dst++ = '\\'; *dst++ = 'a';
 +			*dst++ = L'\\'; *dst++ = L'a';
  			return dst;
 -		case '\v':
 -			*dst++ = '\\'; *dst++ = 'v';
 +		case L'\v':
 +			*dst++ = L'\\'; *dst++ = L'v';
  			return dst;
 -		case '\t':
 -			*dst++ = '\\'; *dst++ = 't';
 +		case L'\t':
 +			*dst++ = L'\\'; *dst++ = L't';
  			return dst;
 -		case '\f':
 -			*dst++ = '\\'; *dst++ = 'f';
 +		case L'\f':
 +			*dst++ = L'\\'; *dst++ = L'f';
  			return dst;
 -		case ' ':
 -			*dst++ = '\\'; *dst++ = 's';
 +		case L' ':
 +			*dst++ = L'\\'; *dst++ = L's';
  			return dst;
 -		case '\0':
 -			*dst++ = '\\'; *dst++ = '0';
 -			if (isoctal(nextc)) {
 -				HAVE(2);
 -				*dst++ = '0';
 -				*dst++ = '0';
 +		case L'\0':
 +			*dst++ = L'\\'; *dst++ = L'0';
 +			if (iswoctal(nextc)) {
 +				*dst++ = L'0';
 +				*dst++ = L'0';
  			}
  			return dst;
  		default:
 -			if (isgraph(c)) {
 -				*dst++ = '\\'; *dst++ = c;
 +			if (iswgraph(c)) {
 +				*dst++ = L'\\';
 +				*dst++ = c;
  				return dst;
  			}
 -			if (dlen)
 -				*dlen = odlen;
  		}
  	}
 -	if (isextra || ((c & 0177) == ' ') || (flag & VIS_OCTAL)) {
 -		HAVE(4);
 -		*dst++ = '\\';
 -		*dst++ = (u_char)(((u_int32_t)(u_char)c >> 6) & 03) + '0';
 -		*dst++ = (u_char)(((u_int32_t)(u_char)c >> 3) & 07) + '0';
 -		*dst++ =			     (c	      & 07) + '0';
 +	if (iswextra || ((c & 0177) == L' ') || (flags & VIS_OCTAL)) {
 +		*dst++ = L'\\';
 +		*dst++ = (u_char)(((u_int32_t)(u_char)c >> 6) & 03) + L'0';
 +		*dst++ = (u_char)(((u_int32_t)(u_char)c >> 3) & 07) + L'0';
 +		*dst++ =			     (c	      & 07) + L'0';
  	} else {
 -		if ((flag & VIS_NOSLASH) == 0) {
 -			HAVE(1);
 -			*dst++ = '\\';
 -		}
 +		if ((flags & VIS_NOSLASH) == 0)
 +			*dst++ = L'\\';
  
  		if (c & 0200) {
 -			HAVE(1);
 -			c &= 0177; *dst++ = 'M';
 +			c &= 0177;
 +			*dst++ = L'M';
  		}
  
 -		if (iscntrl(c)) {
 -			HAVE(2);
 -			*dst++ = '^';
 +		if (iswcntrl(c)) {
 +			*dst++ = L'^';
  			if (c == 0177)
 -				*dst++ = '?';
 +				*dst++ = L'?';
  			else
 -				*dst++ = c + '@';
 +				*dst++ = c + L'@';
  		} else {
 -			HAVE(2);
 -			*dst++ = '-'; *dst++ = c;
 +			*dst++ = L'-';
 +			*dst++ = c;
  		}
  	}
 +
 +	return dst;
 +}
 +
 +/*
 + * This is do_vis, the central code of vis.
 + * dst:	      Pointer to the destination buffer
 + * c:	      Character to encode
 + * flags:     Flags word
 + * nextc:     The character following 'c'
 + * extra:     Pointer to the list of extra characters to be
 + *	      backslash-protected.
 + */
 +static wchar_t *
 +do_svis(wchar_t *dst, wint_t c, int flags, wint_t nextc, const wchar_t *extra)
 +{
 +	int iswextra, i, shft;
 +	uint64_t bmsk, wmsk;
 +
 +	iswextra = wcschr(extra, c) != NULL;
 +	if (!iswextra && (iswgraph(c) || iswwhite(c) ||
 +	    ((flags & VIS_SAFE) && iswsafe(c)))) {
 +		*dst++ = c;
 +		return dst;
 +	}
 +
 +	/* See comment in istrsenvisx() output loop, below. */
 +	wmsk = 0;
 +	for (i = sizeof(wmsk) - 1; i >= 0; i--) {
 +		shft = i * NBBY;
 +		bmsk = (uint64_t)0xffLL << shft;
 +		wmsk |= bmsk;
 +		if ((c & wmsk) || i == 0)
 +			dst = do_mbyte(dst, (wint_t)(
 +			    (uint64_t)(c & bmsk) >> shft),
 +			    flags, nextc, iswextra);
 +	}
 +
  	return dst;
 -out:
 -	*dlen = odlen;
 -	return NULL;
  }
  
 -typedef char *(*visfun_t)(char *, size_t *, int, int, int, const char *);
 +typedef wchar_t *(*visfun_t)(wchar_t *, wint_t, int, wint_t, const wchar_t *);
  
  /*
   * Return the appropriate encoding function depending on the flags given.
   */
  static visfun_t
 -getvisfun(int flag)
 +getvisfun(int flags)
  {
 -	if (flag & VIS_HTTPSTYLE)
 +	if (flags & VIS_HTTPSTYLE)
  		return do_hvis;
 -	if (flag & VIS_MIMESTYLE)
 +	if (flags & VIS_MIMESTYLE)
  		return do_mvis;
  	return do_svis;
  }
  
  /*
 - * isnvis - visually encode characters, also encoding the characters
 - *	  pointed to by `extra'
 + * Expand list of extra characters to not visually encode.
   */
 -static char *
 -isnvis(char *dst, size_t *dlen, int c, int flag, int nextc, const char *extra)
 +static wchar_t *
 +makeextralist(int flags, const char *src)
  {
 -	char *nextra = NULL;
 -	visfun_t f;
 +	wchar_t *dst, *d;
 +	size_t len;
  
 -	_DIAGASSERT(dst != NULL);
 -	_DIAGASSERT(extra != NULL);
 -	MAKEEXTRALIST(flag, nextra, extra);
 -	if (!nextra) {
 -		if (dlen && *dlen == 0) {
 -			errno = ENOSPC;
 -			return NULL;
 -		}
 -		*dst = '\0';		/* can't create nextra, return "" */
 -		return dst;
 -	}
 -	f = getvisfun(flag);
 -	dst = (*f)(dst, dlen, c, flag, nextc, nextra);
 -	free(nextra);
 -	if (dst == NULL || (dlen && *dlen == 0)) {
 -		errno = ENOSPC;
 +	len = strlen(src);
 +	if ((dst = calloc(len + MAXEXTRAS, sizeof(*dst))) == NULL)
  		return NULL;
 -	}
 -	*dst = '\0';
 -	return dst;
 -}
  
 -char *
 -svis(char *dst, int c, int flag, int nextc, const char *extra)
 -{
 -	return isnvis(dst, NULL, c, flag, nextc, extra);
 -}
 +	if (mbstowcs(dst, src, len) == (size_t)-1) {
 +		size_t i;
 +		for (i = 0; i < len; i++)
 +			dst[i] = (wint_t)(u_char)src[i];
 +		d = dst + len;
 +	} else
 +		d = dst + wcslen(dst);
 +
 +	if (flags & VIS_GLOB) {
 +		*d++ = L'*';
 +		*d++ = L'?';
 +		*d++ = L'[';
 +		*d++ = L'#';
 +	}
 +
 +	if (flags & VIS_SP) *d++ = L' ';
 +	if (flags & VIS_TAB) *d++ = L'\t';
 +	if (flags & VIS_NL) *d++ = L'\n';
 +	if ((flags & VIS_NOSLASH) == 0) *d++ = L'\\';
 +	*d = L'\0';
  
 -char *
 -snvis(char *dst, size_t dlen, int c, int flag, int nextc, const char *extra)
 -{
 -	return isnvis(dst, &dlen, c, flag, nextc, extra);
 +	return dst;
  }
  
 -
  /*
 - * strsvis, strsvisx - visually encode characters from src into dst
 - *
 - *	Extra is a pointer to a \0-terminated list of characters to
 - *	be encoded, too. These functions are useful e. g. to
 - *	encode strings in such a way so that they are not interpreted
 - *	by a shell.
 - *
 - *	Dst must be 4 times the size of src to account for possible
 - *	expansion.  The length of dst, not including the trailing NULL,
 - *	is returned.
 - *
 - *	Strsvisx encodes exactly len bytes from src into dst.
 - *	This is useful for encoding a block of data.
 + * istrsenvisx()
 + * 	The main internal function.
 + *	All user-visible functions call this one.
   */
  static int
 -istrsnvis(char *dst, size_t *dlen, const char *csrc, int flag, const char *extra)
 +istrsenvisx(char *mbdst, size_t *dlen, const char *mbsrc, size_t mblength,
 +    int flags, const char *mbextra, int *cerr_ptr)
  {
 -	int c;
 -	char *start;
 -	char *nextra = NULL;
 -	const unsigned char *src = (const unsigned char *)csrc;
 +	wchar_t *dst, *src, *pdst, *psrc, *start, *extra;
 +	size_t len, olen;
 +	uint64_t bmsk, wmsk;
 +	wint_t c;
  	visfun_t f;
 +	int clen = 0, cerr = 0, error = -1, i, shft;
 +	ssize_t mbslength, maxolen;
  
 -	_DIAGASSERT(dst != NULL);
 -	_DIAGASSERT(src != NULL);
 -	_DIAGASSERT(extra != NULL);
 -	MAKEEXTRALIST(flag, nextra, extra);
 -	if (!nextra) {
 -		*dst = '\0';		/* can't create nextra, return "" */
 -		return 0;
 +	_DIAGASSERT(mbdst != NULL);
 +	_DIAGASSERT(mbsrc != NULL);
 +	_DIAGASSERT(mbextra != NULL);
 +
 +	/*
 +	 * Input (mbsrc) is a char string considered to be multibyte
 +	 * characters.  The input loop will read this string pulling
 +	 * one character, possibly multiple bytes, from mbsrc and
 +	 * converting each to wchar_t in src.
 +	 *
 +	 * The vis conversion will be done using the wide char
 +	 * wchar_t string.
 +	 *
 +	 * This will then be converted back to a multibyte string to
 +	 * return to the caller.
 +	 */
 +
 +	/* Allocate space for the wide char strings */
 +	psrc = pdst = extra = NULL;
 +	if (!mblength)
 +		mblength = strlen(mbsrc);
 +	if ((psrc = calloc(mblength + 1, sizeof(*psrc))) == NULL)
 +		return -1;
 +	if ((pdst = calloc((4 * mblength) + 1, sizeof(*pdst))) == NULL)
 +		goto out;
 +	dst = pdst;
 +	src = psrc;
 +
 +	/* Use caller's multibyte conversion error flag. */
 +	if (cerr_ptr)
 +		cerr = *cerr_ptr;
 +
 +	/*
 +	 * Input loop.
 +	 * Handle up to mblength characters (not bytes).  We do not
 +	 * stop at NULs because we may be processing a block of data
 +	 * that includes NULs.
 +	 */
 +	mbslength = (ssize_t)mblength;
 +	/*
 +	 * When inputing a single character, must also read in the
 +	 * next character for nextc, the look-ahead character.
 +	 */
 +	if (mbslength == 1)
 
 *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: open->patched 
State-Changed-By: brooks 
State-Changed-When: Fri Mar 15 00:00:10 UTC 2013 
State-Changed-Why:  
Update of NetBSD's vis with submitters multibyte support has been imported. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=175418 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: bin/175418: commit references a PR
Date: Fri, 15 Mar 2013 00:05:58 +0000 (UTC)

 Author: brooks
 Date: Fri Mar 15 00:05:50 2013
 New Revision: 248303
 URL: http://svnweb.freebsd.org/changeset/base/248303
 
 Log:
   Replace our (un)vis(1) commands with implementations from NetBSD to
   match our import of the (un)vis(3) APIs.
   
   This adds support for multibyte encoding and the -h and -m flags which
   support HTTP and MIME encoding respectively.
   
   PR:		bin/175418
   Obtained from:	NetBSD
 
 Added:
   head/contrib/unvis/
      - copied from r247132, vendor/NetBSD/unvis/dist/
   head/contrib/vis/
      - copied from r247132, vendor/NetBSD/vis/dist/
 Deleted:
   head/usr.bin/unvis/unvis.1
   head/usr.bin/unvis/unvis.c
   head/usr.bin/vis/extern.h
   head/usr.bin/vis/foldit.c
   head/usr.bin/vis/vis.1
   head/usr.bin/vis/vis.c
 Modified:
   head/usr.bin/unvis/Makefile
   head/usr.bin/vis/Makefile
 
 Modified: head/usr.bin/unvis/Makefile
 ==============================================================================
 --- head/usr.bin/unvis/Makefile	Thu Mar 14 23:51:47 2013	(r248302)
 +++ head/usr.bin/unvis/Makefile	Fri Mar 15 00:05:50 2013	(r248303)
 @@ -3,4 +3,6 @@
  
  PROG=	unvis
  
 +.PATH: ${.CURDIR}/../../contrib/unvis
 +
  .include <bsd.prog.mk>
 
 Modified: head/usr.bin/vis/Makefile
 ==============================================================================
 --- head/usr.bin/vis/Makefile	Thu Mar 14 23:51:47 2013	(r248302)
 +++ head/usr.bin/vis/Makefile	Fri Mar 15 00:05:50 2013	(r248303)
 @@ -1,6 +1,10 @@
  #	@(#)Makefile	8.1 (Berkeley) 6/6/93
 +# $FreeBSD$
  
  PROG=	vis
  SRCS=	vis.c foldit.c
  
 +.PATH: ${.CURDIR}/../../contrib/vis
 +CFLAGS+=	-I${.CURDIR}/../../contrib/vis
 +
  .include <bsd.prog.mk>
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: bin/175418: commit references a PR
Date: Tue, 16 Apr 2013 19:28:13 +0000 (UTC)

 Author: brooks
 Date: Tue Apr 16 19:28:00 2013
 New Revision: 249561
 URL: http://svnweb.freebsd.org/changeset/base/249561
 
 Log:
   MFC r248303:
   
   Replace our (un)vis(1) commands with implementations from NetBSD to
   match our import of the (un)vis(3) APIs.
   
   This adds support for multibyte encoding and the -h and -m flags which
   support HTTP and MIME encoding respectively.
   
   PR:		bin/175418
   Obtained from:	NetBSD
 
 Added:
      - copied from r249557, head/contrib/unvis/
      - copied from r249557, head/contrib/vis/
 Directory Properties:
   stable/9/contrib/unvis/   (props changed)
   stable/9/contrib/vis/   (props changed)
 Deleted:
   stable/9/usr.bin/unvis/unvis.1
   stable/9/usr.bin/unvis/unvis.c
   stable/9/usr.bin/vis/extern.h
   stable/9/usr.bin/vis/foldit.c
   stable/9/usr.bin/vis/vis.1
   stable/9/usr.bin/vis/vis.c
 Modified:
   stable/9/usr.bin/unvis/Makefile
   stable/9/usr.bin/vis/Makefile
 Directory Properties:
   stable/9/usr.bin/unvis/   (props changed)
   stable/9/usr.bin/vis/   (props changed)
 
 Modified: stable/9/usr.bin/unvis/Makefile
 ==============================================================================
 --- stable/9/usr.bin/unvis/Makefile	Tue Apr 16 19:27:09 2013	(r249560)
 +++ stable/9/usr.bin/unvis/Makefile	Tue Apr 16 19:28:00 2013	(r249561)
 @@ -3,4 +3,6 @@
  
  PROG=	unvis
  
 +.PATH: ${.CURDIR}/../../contrib/unvis
 +
  .include <bsd.prog.mk>
 
 Modified: stable/9/usr.bin/vis/Makefile
 ==============================================================================
 --- stable/9/usr.bin/vis/Makefile	Tue Apr 16 19:27:09 2013	(r249560)
 +++ stable/9/usr.bin/vis/Makefile	Tue Apr 16 19:28:00 2013	(r249561)
 @@ -1,6 +1,10 @@
  #	@(#)Makefile	8.1 (Berkeley) 6/6/93
 +# $FreeBSD$
  
  PROG=	vis
  SRCS=	vis.c foldit.c
  
 +.PATH: ${.CURDIR}/../../contrib/vis
 +CFLAGS+=	-I${.CURDIR}/../../contrib/vis
 +
  .include <bsd.prog.mk>
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: bin/175418: commit references a PR
Date: Tue, 16 Apr 2013 19:27:26 +0000 (UTC)

 Author: brooks
 Date: Tue Apr 16 19:27:09 2013
 New Revision: 249560
 URL: http://svnweb.freebsd.org/changeset/base/249560
 
 Log:
   MFC r248302:
   
   Update to the latest (un)vis(3) sources from NetBSD.  This adds
   multibyte support[0] and the new functions strenvisx and strsenvisx.
   
   Add MLINKS for vis(3) functions add by this and the initial import from
   NetBSD[1].
   
   PR:		bin/166364, bin/175418
   Submitted by:	"J.R. Oldroyd" <fbsd@opal.com>[0]
   		stefanf[1]
   Obtained from:	NetBSD
 
 Modified:
   stable/9/contrib/libc-vis/unvis.3
   stable/9/contrib/libc-vis/unvis.c
   stable/9/contrib/libc-vis/vis.3
   stable/9/contrib/libc-vis/vis.c
   stable/9/contrib/libc-vis/vis.h
   stable/9/lib/libc/gen/Makefile.inc
   stable/9/lib/libc/gen/Symbol.map
 Directory Properties:
   stable/9/contrib/libc-vis/   (props changed)
   stable/9/lib/libc/   (props changed)
 
 Modified: stable/9/contrib/libc-vis/unvis.3
 ==============================================================================
 --- stable/9/contrib/libc-vis/unvis.3	Tue Apr 16 19:25:41 2013	(r249559)
 +++ stable/9/contrib/libc-vis/unvis.3	Tue Apr 16 19:27:09 2013	(r249560)
 @@ -1,4 +1,4 @@
 -.\"	$NetBSD: unvis.3,v 1.23 2011/03/17 14:06:29 wiz Exp $
 +.\"	$NetBSD: unvis.3,v 1.27 2012/12/15 07:34:36 wiz Exp $
  .\"	$FreeBSD$
  .\"
  .\" Copyright (c) 1989, 1991, 1993
 @@ -126,15 +126,17 @@ The
  function has several return codes that must be handled properly.
  They are:
  .Bl -tag -width UNVIS_VALIDPUSH
 -.It Li \&0 (zero)
 +.It Li \&0 No (zero)
  Another character is necessary; nothing has been recognized yet.
  .It Dv UNVIS_VALID
  A valid character has been recognized and is available at the location
 -pointed to by cp.
 +pointed to by
 +.Fa cp .
  .It Dv UNVIS_VALIDPUSH
  A valid character has been recognized and is available at the location
 -pointed to by cp; however, the character currently passed in should
 -be passed in again.
 +pointed to by
 +.Fa cp ;
 +however, the character currently passed in should be passed in again.
  .It Dv UNVIS_NOCHAR
  A valid sequence was detected, but no character was produced.
  This return code is necessary to indicate a logical break between characters.
 @@ -150,7 +152,7 @@ one more time with flag set to
  to extract any remaining character (the character passed in is ignored).
  .Pp
  The
 -.Ar flag
 +.Fa flag
  argument is also used to specify the encoding style of the source.
  If set to
  .Dv VIS_HTTPSTYLE
 @@ -161,7 +163,8 @@ will decode URI strings as specified in 
  If set to
  .Dv VIS_HTTP1866 ,
  .Fn unvis
 -will decode URI strings as specified in RFC 1866.
 +will decode entity references and numeric character references
 +as specified in RFC 1866.
  If set to
  .Dv VIS_MIMESTYLE ,
  .Fn unvis
 @@ -169,7 +172,9 @@ will decode MIME Quoted-Printable string
  If set to
  .Dv VIS_NOESCAPE ,
  .Fn unvis
 -will not decode \e quoted characters.
 +will not decode
 +.Ql \e
 +quoted characters.
  .Pp
  The following code fragment illustrates a proper use of
  .Fn unvis .
 @@ -204,7 +209,7 @@ The functions
  and
  .Fn strnunvisx
  will return \-1 on error and set
 -.Va errno 
 +.Va errno
  to:
  .Bl -tag -width Er
  .It Bq Er EINVAL
 @@ -212,7 +217,7 @@ An invalid escape sequence was detected,
  .El
  .Pp
  In addition the functions
 -.Fn strnunvis 
 +.Fn strnunvis
  and
  .Fn strnunvisx
  will can also set
 @@ -244,4 +249,14 @@ and
  functions appeared in
  .Nx 6.0
  and
 -.Fx 10.0 .
 +.Fx 9.2 .
 +.Sh BUGS
 +The names
 +.Dv VIS_HTTP1808
 +and
 +.Dv VIS_HTTP1866
 +are wrong.
 +Percent-encoding was defined in RFC 1738, the original RFC for URL.
 +RFC 1866 defines HTML 2.0, an application of SGML, from which it
 +inherits concepts of numeric character references and entity
 +references.
 
 Modified: stable/9/contrib/libc-vis/unvis.c
 ==============================================================================
 --- stable/9/contrib/libc-vis/unvis.c	Tue Apr 16 19:25:41 2013	(r249559)
 +++ stable/9/contrib/libc-vis/unvis.c	Tue Apr 16 19:27:09 2013	(r249560)
 @@ -1,4 +1,4 @@
 -/*	$NetBSD: unvis.c,v 1.40 2012/12/14 21:31:01 christos Exp $	*/
 +/*	$NetBSD: unvis.c,v 1.41 2012/12/15 04:29:53 matt Exp $	*/
  
  /*-
   * Copyright (c) 1989, 1993
 @@ -34,7 +34,7 @@
  #if 0
  static char sccsid[] = "@(#)unvis.c	8.1 (Berkeley) 6/4/93";
  #else
 -__RCSID("$NetBSD: unvis.c,v 1.40 2012/12/14 21:31:01 christos Exp $");
 +__RCSID("$NetBSD: unvis.c,v 1.41 2012/12/15 04:29:53 matt Exp $");
  #endif
  #endif /* LIBC_SCCS and not lint */
  __FBSDID("$FreeBSD$");
 @@ -90,7 +90,7 @@ __weak_alias(strnunvisx,_strnunvisx)
   * RFC 1866
   */
  static const struct nv {
 -	const char name[7];
 +	char name[7];
  	uint8_t value;
  } nv[] = {
  	{ "AElig",	198 }, /* capital AE diphthong (ligature)  */
 
 Modified: stable/9/contrib/libc-vis/vis.3
 ==============================================================================
 --- stable/9/contrib/libc-vis/vis.3	Tue Apr 16 19:25:41 2013	(r249559)
 +++ stable/9/contrib/libc-vis/vis.3	Tue Apr 16 19:27:09 2013	(r249560)
 @@ -1,4 +1,4 @@
 -.\"	$NetBSD: vis.3,v 1.29 2012/12/14 22:55:59 christos Exp $
 +.\"	$NetBSD: vis.3,v 1.39 2013/02/20 20:05:26 christos Exp $
  .\"	$FreeBSD$
  .\"
  .\" Copyright (c) 1989, 1991, 1993
 @@ -30,7 +30,7 @@
  .\"
  .\"     @(#)vis.3	8.1 (Berkeley) 6/9/93
  .\"
 -.Dd December 14, 2012
 +.Dd February 19, 2013
  .Dt VIS 3
  .Os
  .Sh NAME
 @@ -40,12 +40,14 @@
  .Nm strnvis ,
  .Nm strvisx ,
  .Nm strnvisx ,
 +.Nm strenvisx ,
  .Nm svis ,
  .Nm snvis ,
  .Nm strsvis ,
  .Nm strsnvis ,
 -.Nm strsvisx
 -.Nm strsnvisx
 +.Nm strsvisx ,
 +.Nm strsnvisx ,
 +.Nm strsenvisx
  .Nd visually encode characters
  .Sh LIBRARY
  .Lb libc
 @@ -63,6 +65,8 @@
  .Fn strvisx "char *dst" "const char *src" "size_t len" "int flag"
  .Ft int
  .Fn strnvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag"
 +.Ft int
 +.Fn strenvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag" "int *cerr_ptr"
  .Ft char *
  .Fn svis "char *dst" "int c" "int flag" "int nextc" "const char *extra"
  .Ft char *
 @@ -75,6 +79,8 @@
  .Fn strsvisx "char *dst" "const char *src" "size_t len" "int flag" "const char *extra"
  .Ft int
  .Fn strsnvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag" "const char *extra"
 +.Ft int
 +.Fn strsenvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag" "const char *extra" "int *cerr_ptr"
  .Sh DESCRIPTION
  The
  .Fn vis
 @@ -89,11 +95,11 @@ needs no encoding, it is copied in unalt
  The string is null terminated, and a pointer to the end of the string is
  returned.
  The maximum length of any encoding is four
 -characters (not including the trailing
 +bytes (not including the trailing
  .Dv NUL ) ;
  thus, when
  encoding a set of characters into a buffer, the size of the buffer should
 -be four times the number of characters encoded, plus one for the trailing
 +be four times the number of bytes encoded, plus one for the trailing
  .Dv NUL .
  The flag parameter is used for altering the default range of
  characters considered for encoding and for altering the visual
 @@ -142,16 +148,17 @@ terminate
  The size of
  .Fa dst
  must be four times the number
 -of characters encoded from
 +of bytes encoded from
  .Fa src
  (plus one for the
  .Dv NUL ) .
  Both
 -forms return the number of characters in dst (not including
 -the trailing
 +forms return the number of characters in
 +.Fa dst
 +(not including the trailing
  .Dv NUL ) .
  The
 -.Dq n
 +.Dq Nm n
  versions of the functions also take an additional argument
  .Fa dlen
  that indicates the length of the
 @@ -159,7 +166,7 @@ that indicates the length of the
  buffer.
  If
  .Fa dlen
 -is not large enough to fix the converted string then the
 +is not large enough to fit the converted string then the
  .Fn strnvis
  and
  .Fn strnvisx
 @@ -167,6 +174,14 @@ functions return \-1 and set
  .Va errno
  to
  .Dv ENOSPC .
 +The
 +.Fn strenvisx
 +function takes an additional argument,
 +.Fa cerr_ptr ,
 +that is used to pass in and out a multibyte conversion error flag.
 +This is useful when processing single characters at a time when
 +it is possible that the locale may be set to something other
 +than the locale of the characters in the input data.
  .Pp
  The functions
  .Fn svis ,
 @@ -174,16 +189,18 @@ The functions
  .Fn strsvis ,
  .Fn strsnvis ,
  .Fn strsvisx ,
 +.Fn strsnvisx ,
  and
 -.Fn strsnvisx
 +.Fn strsenvisx
  correspond to
  .Fn vis ,
  .Fn nvis ,
  .Fn strvis ,
  .Fn strnvis ,
  .Fn strvisx ,
 +.Fn strnvisx ,
  and
 -.Fn strnvisx
 +.Fn strenvisx
  but have an additional argument
  .Fa extra ,
  pointing to a
 @@ -214,14 +231,13 @@ and
  .Fn strnvisx ) ,
  and the type of representation used.
  By default, all non-graphic characters,
 -except space, tab, and newline are encoded.
 -(See
 -.Xr isgraph 3 . )
 +except space, tab, and newline are encoded (see
 +.Xr isgraph 3 ) .
  The following flags
  alter this:
  .Bl -tag -width VIS_WHITEX
  .It Dv VIS_GLOB
 -Also encode magic characters
 +Also encode the magic characters
  .Ql ( * ,
  .Ql \&? ,
  .Ql \&[
 @@ -243,11 +259,13 @@ Synonym for
  \&|
  .Dv VIS_NL .
  .It Dv VIS_SAFE
 -Only encode "unsafe" characters.
 +Only encode
 +.Dq unsafe
 +characters.
  Unsafe means control characters which may cause common terminals to perform
  unexpected functions.
  Currently this form allows space, tab, newline, backspace, bell, and
 -return - in addition to all graphic characters - unencoded.
 +return \(em in addition to all graphic characters \(em unencoded.
  .El
  .Pp
  (The above flags have no effect for
 @@ -287,8 +305,8 @@ Use an
  to represent meta characters (characters with the 8th
  bit set), and use caret
  .Ql ^
 -to represent control characters see
 -.Pf ( Xr iscntrl 3 ) .
 +to represent control characters (see
 +.Xr iscntrl 3 ) .
  The following formats are used:
  .Bl -tag -width xxxxx
  .It Dv \e^C
 @@ -335,19 +353,20 @@ Use C-style backslash sequences to repre
  characters.
  The following sequences are used to represent the indicated characters:
  .Bd -unfilled -offset indent
 -.Li \ea Tn  - BEL No (007)
 -.Li \eb Tn  - BS No (010)
 -.Li \ef Tn  - NP No (014)
 -.Li \en Tn  - NL No (012)
 -.Li \er Tn  - CR No (015)
 -.Li \es Tn  - SP No (040)
 -.Li \et Tn  - HT No (011)
 -.Li \ev Tn  - VT No (013)
 -.Li \e0 Tn  - NUL No (000)
 +.Li \ea Tn  \(em BEL No (007)
 +.Li \eb Tn  \(em BS No (010)
 +.Li \ef Tn  \(em NP No (014)
 +.Li \en Tn  \(em NL No (012)
 +.Li \er Tn  \(em CR No (015)
 +.Li \es Tn  \(em SP No (040)
 +.Li \et Tn  \(em HT No (011)
 +.Li \ev Tn  \(em VT No (013)
 +.Li \e0 Tn  \(em NUL No (000)
  .Ed
  .Pp
 -When using this format, the nextc parameter is looked at to determine
 -if a
 +When using this format, the
 +.Fa nextc
 +parameter is looked at to determine if a
  .Dv NUL
  character can be encoded as
  .Ql \e0
 @@ -374,8 +393,8 @@ represents a lower case hexadecimal digi
  .It Dv VIS_MIMESTYLE
  Use MIME Quoted-Printable encoding as described in RFC 2045, only don't
  break lines and don't handle CRLF.
 -The form is:
 -.Ql %XX
 +The form is
 +.Ql =XX
  where
  .Em X
  represents an upper case hexadecimal digit.
 @@ -392,6 +411,41 @@ meta characters as
  .Ql M-C ) .
  With this flag set, the encoding is
  ambiguous and non-invertible.
 +.Sh MULTIBYTE CHARACTER SUPPORT
 +These functions support multibyte character input.
 +The encoding conversion is influenced by the setting of the
 +.Ev LC_CTYPE
 +environment variable which defines the set of characters
 +that can be copied without encoding.
 +.Pp
 +When 8-bit data is present in the input,
 +.Ev LC_CTYPE
 +must be set to the correct locale or to the C locale.
 +If the locales of the data and the conversion are mismatched,
 +multibyte character recognition may fail and encoding will be performed
 +byte-by-byte instead.
 +.Pp
 +As noted above,
 +.Fa dst
 +must be four times the number of bytes processed from
 +.Fa src .
 +But note that each multibyte character can be up to
 +.Dv MB_LEN_MAX
 +bytes
 +.\" (see
 +.\" .Xr multibyte 3 )
 +so in terms of multibyte characters,
 +.Fa dst
 +must be four times
 +.Dv MB_LEN_MAX
 +times the number of characters processed from
 +.Fa src .
 +.Sh ENVIRONMENT
 +.Bl -tag -width ".Ev LC_CTYPE"
 +.It Ev LC_CTYPE
 +Specify the locale of the input data.
 +Set to C if the input data locale is unknown.
 +.El
  .Sh ERRORS
  The functions
  .Fn nvis
 @@ -407,11 +461,11 @@ and
  .Fn strsnvisx ,
  will return \-1 when the
  .Fa dlen
 -destination buffer length size is not enough to perform the conversion while
 +destination buffer size is not enough to perform the conversion while
  setting
  .Va errno
  to:
 -.Bl -tag -width Er
 +.Bl -tag -width ".Bq Er ENOSPC"
  .It Bq Er ENOSPC
  The destination buffer size is not large enough to perform the conversion.
  .El
 @@ -419,18 +473,23 @@ The destination buffer size is not large
  .Xr unvis 1 ,
  .Xr vis 1 ,
  .Xr glob 3 ,
 +.\" .Xr multibyte 3 ,
  .Xr unvis 3
  .Rs
  .%A T. Berners-Lee
  .%T Uniform Resource Locators (URL)
 -.%O RFC1738
 +.%O "RFC 1738"
 +.Re
 +.Rs
 +.%T "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies"
 +.%O "RFC 2045"
  .Re
  .Sh HISTORY
  The
  .Fn vis ,
  .Fn strvis ,
  and
 -.Fa strvisx
 +.Fn strvisx
  functions first appeared in
  .Bx 4.4 .
  The
 @@ -441,7 +500,7 @@ and
  functions appeared in
  .Nx 1.5
  and
 -.Fx 10.0 .
 +.Fx 9.2 .
  The buffer size limited versions of the functions
  .Po Fn nvis ,
  .Fn strnvis ,
 @@ -451,6 +510,9 @@ The buffer size limited versions of the 
  and
  .Fn strsnvisx Pc
  appeared in
 -.Nx 6.0
  and
 -.Fx 10.0 .
 +.Fx 9.2 .
 +Myltibyte character support was added in
 +.Nx 7.0
 +and
 +.Fx 9.2 .
 
 Modified: stable/9/contrib/libc-vis/vis.c
 ==============================================================================
 --- stable/9/contrib/libc-vis/vis.c	Tue Apr 16 19:25:41 2013	(r249559)
 +++ stable/9/contrib/libc-vis/vis.c	Tue Apr 16 19:27:09 2013	(r249560)
 @@ -1,4 +1,4 @@
 -/*	$NetBSD: vis.c,v 1.45 2012/12/14 21:38:18 christos Exp $	*/
 +/*	$NetBSD: vis.c,v 1.60 2013/02/21 16:21:20 joerg Exp $	*/
  
  /*-
   * Copyright (c) 1989, 1993
 @@ -57,19 +57,23 @@
  
  #include <sys/cdefs.h>
  #if defined(LIBC_SCCS) && !defined(lint)
 -__RCSID("$NetBSD: vis.c,v 1.45 2012/12/14 21:38:18 christos Exp $");
 +__RCSID("$NetBSD: vis.c,v 1.60 2013/02/21 16:21:20 joerg Exp $");
  #endif /* LIBC_SCCS and not lint */
 +#ifdef __FBSDID
  __FBSDID("$FreeBSD$");
 +#define	_DIAGASSERT(x)	assert(x)
 +#endif
  
  #include "namespace.h"
  #include <sys/types.h>
 +#include <sys/param.h>
  
  #include <assert.h>
  #include <vis.h>
  #include <errno.h>
  #include <stdlib.h>
 -
 -#define	_DIAGASSERT(x)	assert(x)
 +#include <wchar.h>
 +#include <wctype.h>
  
  #ifdef __weak_alias
  __weak_alias(strvisx,_strvisx)
 @@ -81,65 +85,66 @@ __weak_alias(strvisx,_strvisx)
  #include <stdio.h>
  #include <string.h>
  
 -static char *do_svis(char *, size_t *, int, int, int, const char *);
 +/*
 + * The reason for going through the trouble to deal with character encodings
 + * in vis(3), is that we use this to safe encode output of commands. This
 + * safe encoding varies depending on the character set. For example if we
 + * display ps output in French, we don't want to display French characters
 + * as M-foo.
 + */
 +
 +static wchar_t *do_svis(wchar_t *, wint_t, int, wint_t, const wchar_t *);
  
  #undef BELL
 -#define BELL '\a'
 +#define BELL L'\a'
 +
 +#define iswoctal(c)	(((u_char)(c)) >= L'0' && ((u_char)(c)) <= L'7')
 +#define iswwhite(c)	(c == L' ' || c == L'\t' || c == L'\n')
 +#define iswsafe(c)	(c == L'\b' || c == BELL || c == L'\r')
 +#define xtoa(c)		L"0123456789abcdef"[c]
 +#define XTOA(c)		L"0123456789ABCDEF"[c]
  
 -#define isoctal(c)	(((u_char)(c)) >= '0' && ((u_char)(c)) <= '7')
 -#define iswhite(c)	(c == ' ' || c == '\t' || c == '\n')
 -#define issafe(c)	(c == '\b' || c == BELL || c == '\r')
 -#define xtoa(c)		"0123456789abcdef"[c]
 -#define XTOA(c)		"0123456789ABCDEF"[c]
 -
 -#define MAXEXTRAS	9
 -
 -#define MAKEEXTRALIST(flag, extra, orig_str)				      \
 -do {									      \
 -	const char *orig = orig_str;					      \
 -	const char *o = orig;						      \
 -	char *e;							      \
 -	while (*o++)							      \
 -		continue;						      \
 -	extra = malloc((size_t)((o - orig) + MAXEXTRAS));		      \
 -	if (!extra) break;						      \
 -	for (o = orig, e = extra; (*e++ = *o++) != '\0';)		      \
 -		continue;						      \
 -	e--;								      \
 -	if (flag & VIS_GLOB) {						      \
 -		*e++ = '*';						      \
 -		*e++ = '?';						      \
 -		*e++ = '[';						      \
 -		*e++ = '#';						      \
 -	}								      \
 -	if (flag & VIS_SP) *e++ = ' ';					      \
 -	if (flag & VIS_TAB) *e++ = '\t';				      \
 -	if (flag & VIS_NL) *e++ = '\n';					      \
 -	if ((flag & VIS_NOSLASH) == 0) *e++ = '\\';			      \
 -	*e = '\0';							      \
 -} while (/*CONSTCOND*/0)
 +#define MAXEXTRAS	10
 +
 +#if !HAVE_NBTOOL_CONFIG_H
 +#ifndef __NetBSD__
 +/*
 + * On NetBSD MB_LEN_MAX is currently 32 which does not fit on any integer
 + * integral type and it is probably wrong, since currently the maximum
 + * number of bytes and character needs is 6. Until this is fixed, the
 + * loops below are using sizeof(uint64_t) - 1 instead of MB_LEN_MAX, and
 + * the assertion is commented out.
 + */
 +#ifdef __FreeBSD__
 +/*
 + * On FreeBSD including <sys/systm.h> for CTASSERT only works in kernel
 + * mode.
 + */
 +#ifndef CTASSERT
 +#define CTASSERT(x)             _CTASSERT(x, __LINE__)
 +#define _CTASSERT(x, y)         __CTASSERT(x, y)
 +#define __CTASSERT(x, y)        typedef char __assert ## y[(x) ? 1 : -1]
 +#endif
 +#endif /* __FreeBSD__ */
 +CTASSERT(MB_LEN_MAX <= sizeof(uint64_t));
 +#endif /* !__NetBSD__ */
 +#endif
  
  /*
   * This is do_hvis, for HTTP style (RFC 1808)
   */
 -static char *
 -do_hvis(char *dst, size_t *dlen, int c, int flag, int nextc, const char *extra)
 +static wchar_t *
 +do_hvis(wchar_t *dst, wint_t c, int flags, wint_t nextc, const wchar_t *extra)
  {
 -
 -	if ((isascii(c) && isalnum(c))
 +	if (iswalnum(c)
  	    /* safe */
 -	    || c == '$' || c == '-' || c == '_' || c == '.' || c == '+'
 +	    || c == L'$' || c == L'-' || c == L'_' || c == L'.' || c == L'+'
  	    /* extra */
 -	    || c == '!' || c == '*' || c == '\'' || c == '(' || c == ')'
 -	    || c == ',') {
 -		dst = do_svis(dst, dlen, c, flag, nextc, extra);
 -	} else {
 -		if (dlen) {
 -			if (*dlen < 3)
 -				return NULL;
 -			*dlen -= 3;
 -		}
 -		*dst++ = '%';
 +	    || c == L'!' || c == L'*' || c == L'\'' || c == L'(' || c == L')'
 +	    || c == L',')
 +		dst = do_svis(dst, c, flags, nextc, extra);
 +	else {
 +		*dst++ = L'%';
  		*dst++ = xtoa(((unsigned int)c >> 4) & 0xf);
  		*dst++ = xtoa((unsigned int)c & 0xf);
  	}
 @@ -151,312 +156,448 @@ do_hvis(char *dst, size_t *dlen, int c, 
   * This is do_mvis, for Quoted-Printable MIME (RFC 2045)
   * NB: No handling of long lines or CRLF.
   */
 -static char *
 -do_mvis(char *dst, size_t *dlen, int c, int flag, int nextc, const char *extra)
 +static wchar_t *
 +do_mvis(wchar_t *dst, wint_t c, int flags, wint_t nextc, const wchar_t *extra)
  {
 -	if ((c != '\n') &&
 +	if ((c != L'\n') &&
  	    /* Space at the end of the line */
 -	    ((isspace(c) && (nextc == '\r' || nextc == '\n')) ||
 +	    ((iswspace(c) && (nextc == L'\r' || nextc == L'\n')) ||
  	    /* Out of range */
 -	    (!isspace(c) && (c < 33 || (c > 60 && c < 62) || c > 126)) ||
 -	    /* Specific char to be escaped */ 
 -	    strchr("#$@[\\]^`{|}~", c) != NULL)) {
 -		if (dlen) {
 -			if (*dlen < 3)
 -				return NULL;
 -			*dlen -= 3;
 -		}
 -		*dst++ = '=';
 +	    (!iswspace(c) && (c < 33 || (c > 60 && c < 62) || c > 126)) ||
 +	    /* Specific char to be escaped */
 +	    wcschr(L"#$@[\\]^`{|}~", c) != NULL)) {
 +		*dst++ = L'=';
  		*dst++ = XTOA(((unsigned int)c >> 4) & 0xf);
  		*dst++ = XTOA((unsigned int)c & 0xf);
 -	} else {
 -		dst = do_svis(dst, dlen, c, flag, nextc, extra);
 -	}
 +	} else
 +		dst = do_svis(dst, c, flags, nextc, extra);
  	return dst;
  }
  
  /*
 - * This is do_vis, the central code of vis.
 - * dst:	      Pointer to the destination buffer
 - * c:	      Character to encode
 - * flag:      Flag word
 - * nextc:     The character following 'c'
 - * extra:     Pointer to the list of extra characters to be
 - *	      backslash-protected.
 + * Output single byte of multibyte character.
   */
 -static char *
 -do_svis(char *dst, size_t *dlen, int c, int flag, int nextc, const char *extra)
 +static wchar_t *
 +do_mbyte(wchar_t *dst, wint_t c, int flags, wint_t nextc, int iswextra)
  {
 -	int isextra;
 -	size_t odlen = dlen ? *dlen : 0;
 -
 -	isextra = strchr(extra, c) != NULL;
 -#define HAVE(x) \
 -	do { \
 -		if (dlen) { \
 -			if (*dlen < (x)) \
 -				goto out; \
 -			*dlen -= (x); \
 -		} \
 -	} while (/*CONSTCOND*/0)
 -	if (!isextra && isascii(c) && (isgraph(c) || iswhite(c) ||
 -	    ((flag & VIS_SAFE) && issafe(c)))) {
 -		HAVE(1);
 -		*dst++ = c;
 -		return dst;
 -	}
 -	if (flag & VIS_CSTYLE) {
 -		HAVE(2);
 +	if (flags & VIS_CSTYLE) {
  		switch (c) {
 -		case '\n':
 -			*dst++ = '\\'; *dst++ = 'n';
 +		case L'\n':
 +			*dst++ = L'\\'; *dst++ = L'n';
  			return dst;
 -		case '\r':
 -			*dst++ = '\\'; *dst++ = 'r';
 +		case L'\r':
 +			*dst++ = L'\\'; *dst++ = L'r';
  			return dst;
 -		case '\b':
 -			*dst++ = '\\'; *dst++ = 'b';
 +		case L'\b':
 +			*dst++ = L'\\'; *dst++ = L'b';
  			return dst;
  		case BELL:
 -			*dst++ = '\\'; *dst++ = 'a';
 +			*dst++ = L'\\'; *dst++ = L'a';
  			return dst;
 -		case '\v':
 -			*dst++ = '\\'; *dst++ = 'v';
 +		case L'\v':
 +			*dst++ = L'\\'; *dst++ = L'v';
  			return dst;
 -		case '\t':
 -			*dst++ = '\\'; *dst++ = 't';
 +		case L'\t':
 +			*dst++ = L'\\'; *dst++ = L't';
  			return dst;
 -		case '\f':
 -			*dst++ = '\\'; *dst++ = 'f';
 +		case L'\f':
 +			*dst++ = L'\\'; *dst++ = L'f';
  			return dst;
 -		case ' ':
 -			*dst++ = '\\'; *dst++ = 's';
 +		case L' ':
 +			*dst++ = L'\\'; *dst++ = L's';
  			return dst;
 -		case '\0':
 -			*dst++ = '\\'; *dst++ = '0';
 -			if (isoctal(nextc)) {
 -				HAVE(2);
 -				*dst++ = '0';
 -				*dst++ = '0';
 +		case L'\0':
 +			*dst++ = L'\\'; *dst++ = L'0';
 +			if (iswoctal(nextc)) {
 +				*dst++ = L'0';
 +				*dst++ = L'0';
  			}
  			return dst;
  		default:
 -			if (isgraph(c)) {
 -				*dst++ = '\\'; *dst++ = c;
 +			if (iswgraph(c)) {
 +				*dst++ = L'\\';
 +				*dst++ = c;
  				return dst;
  			}
 -			if (dlen)
 -				*dlen = odlen;
  		}
  	}
 -	if (isextra || ((c & 0177) == ' ') || (flag & VIS_OCTAL)) {
 -		HAVE(4);
 -		*dst++ = '\\';
 -		*dst++ = (u_char)(((u_int32_t)(u_char)c >> 6) & 03) + '0';
 -		*dst++ = (u_char)(((u_int32_t)(u_char)c >> 3) & 07) + '0';
 -		*dst++ =			     (c	      & 07) + '0';
 +	if (iswextra || ((c & 0177) == L' ') || (flags & VIS_OCTAL)) {
 +		*dst++ = L'\\';
 +		*dst++ = (u_char)(((u_int32_t)(u_char)c >> 6) & 03) + L'0';
 +		*dst++ = (u_char)(((u_int32_t)(u_char)c >> 3) & 07) + L'0';
 +		*dst++ =			     (c	      & 07) + L'0';
  	} else {
 -		if ((flag & VIS_NOSLASH) == 0) {
 -			HAVE(1);
 -			*dst++ = '\\';
 -		}
 +		if ((flags & VIS_NOSLASH) == 0)
 +			*dst++ = L'\\';
  
  		if (c & 0200) {
 -			HAVE(1);
 -			c &= 0177; *dst++ = 'M';
 +			c &= 0177;
 +			*dst++ = L'M';
  		}
  
 -		if (iscntrl(c)) {
 -			HAVE(2);
 -			*dst++ = '^';
 +		if (iswcntrl(c)) {
 +			*dst++ = L'^';
  			if (c == 0177)
 -				*dst++ = '?';
 +				*dst++ = L'?';
  			else
 -				*dst++ = c + '@';
 +				*dst++ = c + L'@';
  		} else {
 -			HAVE(2);
 -			*dst++ = '-'; *dst++ = c;
 +			*dst++ = L'-';
 +			*dst++ = c;
  		}
  	}
 +
 +	return dst;
 +}
 +
 +/*
 + * This is do_vis, the central code of vis.
 + * dst:	      Pointer to the destination buffer
 + * c:	      Character to encode
 + * flags:     Flags word
 + * nextc:     The character following 'c'
 + * extra:     Pointer to the list of extra characters to be
 + *	      backslash-protected.
 + */
 +static wchar_t *
 +do_svis(wchar_t *dst, wint_t c, int flags, wint_t nextc, const wchar_t *extra)
 +{
 +	int iswextra, i, shft;
 +	uint64_t bmsk, wmsk;
 +
 +	iswextra = wcschr(extra, c) != NULL;
 +	if (!iswextra && (iswgraph(c) || iswwhite(c) ||
 +	    ((flags & VIS_SAFE) && iswsafe(c)))) {
 +		*dst++ = c;
 +		return dst;
 +	}
 +
 +	/* See comment in istrsenvisx() output loop, below. */
 +	wmsk = 0;
 +	for (i = sizeof(wmsk) - 1; i >= 0; i--) {
 +		shft = i * NBBY;
 +		bmsk = (uint64_t)0xffLL << shft;
 +		wmsk |= bmsk;
 +		if ((c & wmsk) || i == 0)
 +			dst = do_mbyte(dst, (wint_t)(
 +			    (uint64_t)(c & bmsk) >> shft),
 +			    flags, nextc, iswextra);
 +	}
 +
  	return dst;
 -out:
 -	*dlen = odlen;
 -	return NULL;
  }
  
 -typedef char *(*visfun_t)(char *, size_t *, int, int, int, const char *);
 +typedef wchar_t *(*visfun_t)(wchar_t *, wint_t, int, wint_t, const wchar_t *);
  
  /*
   * Return the appropriate encoding function depending on the flags given.
   */
  static visfun_t
 -getvisfun(int flag)
 +getvisfun(int flags)
  {
 -	if (flag & VIS_HTTPSTYLE)
 +	if (flags & VIS_HTTPSTYLE)
  		return do_hvis;
 -	if (flag & VIS_MIMESTYLE)
 +	if (flags & VIS_MIMESTYLE)
  		return do_mvis;
  	return do_svis;
  }
  
  /*
 - * isnvis - visually encode characters, also encoding the characters
 - *	  pointed to by `extra'
 + * Expand list of extra characters to not visually encode.
   */
 -static char *
 -isnvis(char *dst, size_t *dlen, int c, int flag, int nextc, const char *extra)
 +static wchar_t *
 +makeextralist(int flags, const char *src)
  {
 -	char *nextra = NULL;
 -	visfun_t f;
 +	wchar_t *dst, *d;
 +	size_t len;
  
 -	_DIAGASSERT(dst != NULL);
 -	_DIAGASSERT(extra != NULL);
 -	MAKEEXTRALIST(flag, nextra, extra);
 -	if (!nextra) {
 -		if (dlen && *dlen == 0) {
 -			errno = ENOSPC;
 -			return NULL;
 -		}
 -		*dst = '\0';		/* can't create nextra, return "" */
 -		return dst;
 -	}
 -	f = getvisfun(flag);
 -	dst = (*f)(dst, dlen, c, flag, nextc, nextra);
 -	free(nextra);
 -	if (dst == NULL || (dlen && *dlen == 0)) {
 -		errno = ENOSPC;
 +	len = strlen(src);
 +	if ((dst = calloc(len + MAXEXTRAS, sizeof(*dst))) == NULL)
  		return NULL;
 -	}
 -	*dst = '\0';
 -	return dst;
 -}
  
 -char *
 -svis(char *dst, int c, int flag, int nextc, const char *extra)
 -{
 -	return isnvis(dst, NULL, c, flag, nextc, extra);
 -}
 +	if (mbstowcs(dst, src, len) == (size_t)-1) {
 +		size_t i;
 +		for (i = 0; i < len; i++)
 +			dst[i] = (wint_t)(u_char)src[i];
 +		d = dst + len;
 +	} else
 +		d = dst + wcslen(dst);
 +
 +	if (flags & VIS_GLOB) {
 +		*d++ = L'*';
 +		*d++ = L'?';
 +		*d++ = L'[';
 +		*d++ = L'#';
 +	}
 +
 +	if (flags & VIS_SP) *d++ = L' ';
 +	if (flags & VIS_TAB) *d++ = L'\t';
 +	if (flags & VIS_NL) *d++ = L'\n';
 +	if ((flags & VIS_NOSLASH) == 0) *d++ = L'\\';
 +	*d = L'\0';
  
 -char *
 -snvis(char *dst, size_t dlen, int c, int flag, int nextc, const char *extra)
 -{
 -	return isnvis(dst, &dlen, c, flag, nextc, extra);
 +	return dst;
  }
  
 -
  /*
 - * strsvis, strsvisx - visually encode characters from src into dst
 - *
 - *	Extra is a pointer to a \0-terminated list of characters to
 - *	be encoded, too. These functions are useful e. g. to
 - *	encode strings in such a way so that they are not interpreted
 - *	by a shell.
 - *
 - *	Dst must be 4 times the size of src to account for possible
 - *	expansion.  The length of dst, not including the trailing NULL,
 - *	is returned.
 - *
 - *	Strsvisx encodes exactly len bytes from src into dst.
 - *	This is useful for encoding a block of data.
 + * istrsenvisx()
 + * 	The main internal function.
 + *	All user-visible functions call this one.
   */
  static int
 -istrsnvis(char *dst, size_t *dlen, const char *csrc, int flag, const char *extra)
 +istrsenvisx(char *mbdst, size_t *dlen, const char *mbsrc, size_t mblength,
 +    int flags, const char *mbextra, int *cerr_ptr)
  {
 -	int c;
 -	char *start;
 -	char *nextra = NULL;
 -	const unsigned char *src = (const unsigned char *)csrc;
 +	wchar_t *dst, *src, *pdst, *psrc, *start, *extra;
 +	size_t len, olen;
 +	uint64_t bmsk, wmsk;
 +	wint_t c;
  	visfun_t f;
 +	int clen = 0, cerr = 0, error = -1, i, shft;
 +	ssize_t mbslength, maxolen;
  
 -	_DIAGASSERT(dst != NULL);
 -	_DIAGASSERT(src != NULL);
 -	_DIAGASSERT(extra != NULL);
 -	MAKEEXTRALIST(flag, nextra, extra);
 -	if (!nextra) {
 -		*dst = '\0';		/* can't create nextra, return "" */
 -		return 0;
 +	_DIAGASSERT(mbdst != NULL);
 +	_DIAGASSERT(mbsrc != NULL);
 +	_DIAGASSERT(mbextra != NULL);
 +
 +	/*
 +	 * Input (mbsrc) is a char string considered to be multibyte
 +	 * characters.  The input loop will read this string pulling
 +	 * one character, possibly multiple bytes, from mbsrc and
 +	 * converting each to wchar_t in src.
 +	 *
 +	 * The vis conversion will be done using the wide char
 +	 * wchar_t string.
 +	 *
 +	 * This will then be converted back to a multibyte string to
 +	 * return to the caller.
 +	 */
 +
 +	/* Allocate space for the wide char strings */
 +	psrc = pdst = extra = NULL;
 +	if (!mblength)
 +		mblength = strlen(mbsrc);
 +	if ((psrc = calloc(mblength + 1, sizeof(*psrc))) == NULL)
 +		return -1;
 +	if ((pdst = calloc((4 * mblength) + 1, sizeof(*pdst))) == NULL)
 +		goto out;
 +	dst = pdst;
 +	src = psrc;
 +
 +	/* Use caller's multibyte conversion error flag. */
 +	if (cerr_ptr)
 +		cerr = *cerr_ptr;
 +
 +	/*
 +	 * Input loop.
 +	 * Handle up to mblength characters (not bytes).  We do not
 +	 * stop at NULs because we may be processing a block of data
 +	 * that includes NULs.
 +	 */
 +	mbslength = (ssize_t)mblength;
 +	/*
 +	 * When inputing a single character, must also read in the
 +	 * next character for nextc, the look-ahead character.
 +	 */
 +	if (mbslength == 1)
 
 *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: patched->closed 
State-Changed-By: brooks 
State-Changed-When: Tue Apr 16 19:31:59 UTC 2013 
State-Changed-Why:  
I've merged NetBSD vis with multibyte support to stable/9. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=175418 
>Unformatted:
