From nobody@FreeBSD.org  Tue Dec 28 17:57:53 2010
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 23C3D106566C
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 28 Dec 2010 17:57:53 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (unknown [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id 13AB08FC08
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 28 Dec 2010 17:57:53 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id oBSHvqx2022003
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 28 Dec 2010 17:57:52 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id oBSHvqcr022002;
	Tue, 28 Dec 2010 17:57:52 GMT
	(envelope-from nobody)
Message-Id: <201012281757.oBSHvqcr022002@red.freebsd.org>
Date: Tue, 28 Dec 2010 17:57:52 GMT
From: Mathieu <sigsys@gmail.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: regex(3) bug with UTF-8 locale
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         153502
>Category:       bin
>Synopsis:       [libc] regex(3) bug with UTF-8 locale
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Dec 28 18:00:32 UTC 2010
>Closed-Date:    
>Last-Modified:  Mon Jan 03 20:54:22 UTC 2011
>Originator:     Mathieu
>Release:        8.1-STABLE, 7.3-RELEASE-p3
>Organization:
>Environment:
8.1-STABLE/amd64 r212312M
7.3-RELEASE-p3/i386 r215233M

>Description:
I'm seeing odd behavior from programs using regex(3) like less(1), vi(1) and sed(1) when using LANG=en_US.UTF-8 and UTF-8 inputs.

Sometimes it seems to work right:

$ echo '' | sed -ne '/^.$/p'

$ echo '' | sed -ne '/^..$/p'

$ echo 'aa' | sed -ne '/a.a/p'
aa
$ echo 'aa' | sed -ne '/a.*a/p'
aa
$ echo 'aaaa' | sed -ne '/aa.aa/p'
aaaa
$ echo 'aaa' | sed -ne '/a.a.a/p'
aaa

But not always:

$ echo 'a' | sed -ne '/.a/p'
$ echo 'aaa' | sed -ne '/a.aa/p'
$ echo 'a' | sed -ne '/.a./p'


Seems like using ".*", ".+", ".{0,}" or ".{1,}" works right, but ".{0,1}", ".{1,1}" or a lone "." doesn't always.

>How-To-Repeat:

>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:
