From nobody@FreeBSD.org  Fri Dec 17 21:25:24 2010
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 310A1106564A
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 17 Dec 2010 21:25:24 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (unknown [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id 1F3F28FC13
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 17 Dec 2010 21:25:24 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id oBHLPN7I068103
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 17 Dec 2010 21:25:23 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id oBHLPNCV068102;
	Fri, 17 Dec 2010 21:25:23 GMT
	(envelope-from nobody)
Message-Id: <201012172125.oBHLPNCV068102@red.freebsd.org>
Date: Fri, 17 Dec 2010 21:25:23 GMT
From: "Pedro F. Giffuni" <giffunip@tutopia.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: libc/regex: Add support for \< and \> word delimiters
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         153257
>Category:       bin
>Synopsis:       [libc] [patch] regex(3): Add support for \< and \> word delimiters
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    pfg
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Fri Dec 17 21:30:10 UTC 2010
>Closed-Date:    Sun May 04 02:15:55 UTC 2014
>Last-Modified:  Sun May  4 02:20:00 UTC 2014
>Originator:     Pedro F. Giffuni
>Release:        8.1-RELEASE
>Organization:
>Environment:
FreeBSD mogwai.giffuni.net 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Tue Nov  9 10:31:43 UTC 2010     pedro@mogwai.giffuni.net:/usr/src/sys/i386/compile/GENERIC  i386

>Description:
As part as a sed port for Illumos, Garrett D'Amore has updated
FreeBSD's regex from FreeBSD to include support for \< and \>,
as these are historically in wide use on Solaris.

This is in accordance to what gnu ans Solaris regex do.

The diff was taken from here:
https://www.illumos.org/issues/516

According to Garret's blog:
http://gdamore.blogspot.com/2010/12/i-sed1-so.html

(FreeBSD friends, please feel free to include these changes back -- I've not changed the original BSD license.)
>How-To-Repeat:

>Fix:
Patch attached.

Patch attached with submission follows:

diff -ru regex.orig/regcomp.c regex/regcomp.c
--- regex.orig/regcomp.c	2010-12-17 16:01:01.000000000 +0000
+++ regex/regcomp.c	2010-12-17 16:08:13.000000000 +0000
@@ -407,7 +407,17 @@
 	case '\\':
 		(void)REQUIRE(MORE(), REG_EESCAPE);
 		wc = WGETNEXT();
-		ordinary(p, wc);
+		switch (wc) {
+		case '<':
+			EMIT(OBOW, 0);
+			break;
+		case '>':
+			EMIT(OEOW, 0);
+			break;
+		default:
+			ordinary(p, wc);
+			break;
+		}
 		break;
 	case '{':		/* okay as ordinary except if digit follows */
 		(void)REQUIRE(!MORE() || !isdigit((uch)PEEK()), REG_BADRPT);
@@ -564,6 +574,12 @@
 	case '[':
 		p_bracket(p);
 		break;
+	case BACKSL|'<':
+		EMIT(OBOW, 0);
+		break;
+	case BACKSL|'>':
+		EMIT(OEOW, 0);
+		break;
 	case BACKSL|'{':
 		SETERROR(REG_BADRPT);
 		break;


>Release-Note:
>Audit-Trail:

From: "Pedro F. Giffuni" <giffunip@tutopia.com>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: bin/153257: [libc] [patch] regex(3): Add support for \< and \> word delimiters
Date: Sat, 26 Mar 2011 10:27:01 -0700 (PDT)

 This patch is debatable: while it supports better the Solaris
 and GNU behaviour, this is not something required by POSIX.
 
 The behaviour is also used by some scripts in the ports tree
 to distinguish between GNU sed and BSD sed.
 
 For now I guess it may be better if we don't apply this
 patch and let a future regex replacement (TRE?) decide
 if it supports the extensions or not.
 
 
       
State-Changed-From-To: open->closed 
State-Changed-By: pfg 
State-Changed-When: Fri Dec 16 16:12:46 UTC 2011 
State-Changed-Why:  
Let's not change the default behaviour and let the 
new regex implementation decide what to do. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=153257 
State-Changed-From-To: closed->open 
State-Changed-By: pfg 
State-Changed-When: Thu Apr 17 02:35:51 UTC 2014 
State-Changed-Why:  
re-open: regex is likely to stay for a while more, and it may be good 
to gain more compatibility with Illumos/Solaris.  


Class-Changed-From-To: sw-bug->change-request 
Class-Changed-By: pfg 
Class-Changed-When: Thu Apr 17 02:35:51 UTC 2014 
Class-Changed-Why:  
re-open: regex is likely to stay for a while more, and it may be good 
to gain more compatibility with Illumos/Solaris.  

http://www.freebsd.org/cgi/query-pr.cgi?pr=153257 

Responsible-Changed-From-To: freebsd-bugs->pfg 
Responsible-Changed-By: pfg 
Responsible-Changed-When: Sat May 3 19:31:47 UTC 2014 
Responsible-Changed-Why:  
Grab it. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=153257 
State-Changed-From-To: open->closed 
State-Changed-By: pfg 
State-Changed-When: Sun May 4 02:14:26 UTC 2014 
State-Changed-Why:  
Close again: it is unclear that the extra portability is 
sufficient advantage. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=153257 

From: Pedro Giffuni <pfg@freebsd.org>
To: "bug-followup@FreeBSD.org" <bug-followup@FreeBSD.org>
Cc:  
Subject: Re: bin/153257: [libc] [patch] regex(3): Add support for \< and \>
 word delimiters
Date: Sat, 03 May 2014 21:13:27 -0500

 This is a multi-part message in MIME format.
 --------------090003080902030601010509
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: 7bit
 
 Updated patch to include documentation changes, however I am having 
 second thoughts. The behavior is non-standard and many other regex 
 implementations don't have it.
 For reference:
 
 http://mail-index.netbsd.org/tech-userlevel/2012/12/02/msg006954.html
 
 Adding the feature may make it more difficult to change the regex 
 implementation in the future and we may actually want to adopt some 
 library that has better international support.
 
 
 
 
 --------------090003080902030601010509
 Content-Type: text/plain; charset=us-ascii;
  name="patch-regex-svr4.txt"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
  filename="patch-regex-svr4.txt"
 
 Index: lib/libc/regex/re_format.7
 ===================================================================
 --- lib/libc/regex/re_format.7	(revision 265253)
 +++ lib/libc/regex/re_format.7	(working copy)
 @@ -314,6 +314,13 @@
  .St -p1003.2 ,
  and should be used with
  caution in software intended to be portable to other systems.
 +The additional word delimiters  
 +.Ql \e<
 +and
 +.Ql \e> 
 +are provided to ease compatibility with traditional
 +.Xr svr4 4
 +systems but are not portable and should be avoided.
  .Pp
  In the event that an RE could match more than one substring of a given
  string,
 Index: lib/libc/regex/regcomp.c
 ===================================================================
 --- lib/libc/regex/regcomp.c	(revision 265253)
 +++ lib/libc/regex/regcomp.c	(working copy)
 @@ -412,7 +412,17 @@
  	case '\\':
  		(void)REQUIRE(MORE(), REG_EESCAPE);
  		wc = WGETNEXT();
 -		ordinary(p, wc);
 +		switch (wc) {
 +		case '<':
 +			EMIT(OBOW, 0);
 +			break;
 +		case '>':
 +			EMIT(OEOW, 0);
 +			break;
 +		default:
 +			ordinary(p, wc);
 +			break;
 +		}
  		break;
  	case '{':		/* okay as ordinary except if digit follows */
  		(void)REQUIRE(!MORE() || !isdigit((uch)PEEK()), REG_BADRPT);
 @@ -569,6 +579,12 @@
  	case '[':
  		p_bracket(p);
  		break;
 +	case BACKSL|'<':
 +		EMIT(OBOW, 0);
 +		break;
 +	case BACKSL|'>':
 +		EMIT(OEOW, 0);
 +		break;
  	case BACKSL|'{':
  		SETERROR(REG_BADRPT);
  		break;
 
 --------------090003080902030601010509--
>Unformatted:
