From kazuaki@aliceblue.jp  Mon Jun  4 18:24:03 2007
Return-Path: <kazuaki@aliceblue.jp>
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id BC74316A469
	for <FreeBSD-gnats-submit@freebsd.org>; Mon,  4 Jun 2007 18:24:03 +0000 (UTC)
	(envelope-from kazuaki@aliceblue.jp)
Received: from pd5f7be.tokyff01.ap.so-net.ne.jp (pd5f7be.tokyff01.ap.so-net.ne.jp [202.213.247.190])
	by mx1.freebsd.org (Postfix) with ESMTP id 7FD6313C44B
	for <FreeBSD-gnats-submit@freebsd.org>; Mon,  4 Jun 2007 18:24:03 +0000 (UTC)
	(envelope-from kazuaki@aliceblue.jp)
Received: from eyes.aliceblue.jp (dhcp21.aliceblue.jp [192.168.11.21])
	by pd5f7be.tokyff01.ap.so-net.ne.jp (Postfix) with SMTP id 775D2597C28
	for <FreeBSD-gnats-submit@freebsd.org>; Tue,  5 Jun 2007 03:11:58 +0900 (JST)
Received: from eyes.aliceblue.jp (localhost [127.0.0.1])
	by eyes.aliceblue.jp (8.13.8/8.13.8) with ESMTP id l54IAdHB048114
	for <FreeBSD-gnats-submit@freebsd.org>; Tue, 5 Jun 2007 03:10:39 +0900 (JST)
	(envelope-from kazuaki@aliceblue.jp)
Received: (from kazuaki@localhost)
	by eyes.aliceblue.jp (8.13.8/8.13.8/Submit) id l54IAd1D048113;
	Tue, 5 Jun 2007 03:10:39 +0900 (JST)
	(envelope-from kazuaki@aliceblue.jp)
Message-Id: <200706041810.l54IAd1D048113@eyes.aliceblue.jp>
Date: Tue, 5 Jun 2007 03:10:39 +0900 (JST)
From: Kazuaki ODA <kazuaki@aliceblue.jp>
Reply-To: Kazuaki ODA <kazuaki@aliceblue.jp>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: [PATCH] grep(1) outputs NOT-matched lines (with multi-bytes characters)
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         113343
>Category:       gnu
>Synopsis:       [patch] grep(1) outputs NOT-matched lines (with multi-bytes characters)
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jun 04 18:30:05 GMT 2007
>Closed-Date:    
>Last-Modified:  Thu Jun 07 05:12:44 GMT 2007
>Originator:     Kazuaki ODA
>Release:        FreeBSD 6.2-RELEASE-p5 i386
>Organization:
>Environment:
System: FreeBSD eyes.aliceblue.jp 6.2-RELEASE-p5 FreeBSD 6.2-RELEASE-p5 #3: Sat May 26 12:45:48 JST 2007 kazuaki@eyes.aliceblue.jp:/usr/obj/usr/src/sys/EYES i386


	
>Description:
	Our grep(1) is a bit broken with multi-bytes characters.
	If byte sequence matches the searched pattern, grep(1) outputs the line
	containing the sequence.  Of course, this is fine for single-byte
	characters, but may be wrong for multi-bytes characters.  If matched
	sequence is the second byte of a character and the first byte of the
	next character, that is not matched and grep(1) should not output the
	line.
	Since our grep(1) has support for multi-bytes characters (and locales),
	it does not always behave as described above, but sometimes does.
>How-To-Repeat:
	
>Fix:

	Apply attached patch.
	mbstate_t should be initialized whenever mbrlen() returns -2, I think.

--- search.c.diff begins here ---
--- gnu/usr.bin/grep/search.c.orig	Wed Mar 22 05:51:35 2006
+++ gnu/usr.bin/grep/search.c	Tue Jun  5 01:09:24 2007
@@ -400,9 +400,12 @@
 			}
 
 		      if (mlen == (size_t) -2)
-			/* Offset points inside multibyte character:
-			 * no good. */
-			break;
+			{
+			  /* Offset points inside multibyte character:
+			   * no good. */
+			  memset (&mbs, '\0', sizeof (mbstate_t));
+			  break;
+			}
 
 		      beg += mlen;
 		      bytes_left -= mlen;
@@ -462,9 +465,12 @@
 			}
 
 		      if (mlen == (size_t) -2)
-			/* Offset points inside multibyte character:
-			 * no good. */
-			break;
+			{
+			  /* Offset points inside multibyte character:
+			   * no good. */
+			  memset (&mbs, '\0', sizeof (mbstate_t));
+			  break;
+			}
 
 		      beg += mlen;
 		      bytes_left -= mlen;
@@ -925,15 +931,21 @@
 		}
 
 	      if (mlen == (size_t) -2)
-		/* Offset points inside multibyte character: no good. */
-		break;
+		{
+		  /* Offset points inside multibyte character: no good. */
+		  memset (&mbs, '\0', sizeof (mbstate_t));
+		  break;
+		}
 
 	      beg += mlen;
 	      bytes_left -= mlen;
 	    }
 
 	  if (bytes_left)
-	    continue;
+	    {
+	      beg += bytes_left;
+	      continue;
+	    }
 	}
       else
 #endif /* MBS_SUPPORT */
@@ -1051,6 +1063,7 @@
 			    {
 			      /* Offset points inside multibyte character:
 			       * no good. */
+			      memset (&mbs, '\0', sizeof (mbstate_t));
 			      break;
 			    }
 
--- search.c.diff ends here ---


>Release-Note:
>Audit-Trail:
>Unformatted:
