From nobody@FreeBSD.org  Mon Sep 17 23:59:11 2012
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 42B41106566C
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 17 Sep 2012 23:59:11 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id 2E4AE8FC08
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 17 Sep 2012 23:59:11 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.5/8.14.5) with ESMTP id q8HNxBej073705
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 17 Sep 2012 23:59:11 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.5/8.14.5/Submit) id q8HNxAdj073704;
	Mon, 17 Sep 2012 23:59:10 GMT
	(envelope-from nobody)
Message-Id: <201209172359.q8HNxAdj073704@red.freebsd.org>
Date: Mon, 17 Sep 2012 23:59:10 GMT
From: Devin Teske <dteske@FreeBSD.org>
To: freebsd-gnats-submit@FreeBSD.org
Subject: awk(1) does not support word-boundary metacharacters
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         171725
>Category:       bin
>Synopsis:       awk(1) does not support word-boundary metacharacters
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Sep 18 00:00:21 UTC 2012
>Closed-Date:    
>Last-Modified:  Fri Oct 12 16:30:01 UTC 2012
>Originator:     Devin Teske
>Release:        FreeBSD 9.0-RELEASE i386
>Organization:
FIS Global, Inc.
>Environment:
FreeBSD push900.vicor.com 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan  3 07:15:25 UTC 2012     root@obrian.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  i386
>Description:
The awk(1) manual describes (quote) "regular expressions are as in egrep; see grep(1)."

This leads one to believe that awk(1) supports both basic REs and extended REs.

A discrepancy/gap has been found in this stated coverage.

one-true-awk (our awk(1)) does not support the word-boundary metacharacters (\< and \>) for matching beginning- and ending-of-word.
>How-To-Repeat:
$ echo xxxa | awk '/xxx\>/{print}'
### this produces no output, as-is expected since
### the word "xxxa" does not end in "xxx"

$ echo xxx | awk '/xxx\>/{print}'
### this too produces no output, indicating that
### \> is not a valid metacharacter for "end-of-word"

>Fix:
There are two proposed solutions.

1. Add a note to the awk(1) manual stating it does not support all RE metacharacters (note that '\<' and '\>' are valid both as BRE and ERE).

or

2. Enhance awk(1) to support these BRE/ERE metacharacters so that the awk(1) manual is accurate without a patch.

>Release-Note:
>Audit-Trail:

From: Devin Teske <devin.teske@fisglobal.com>
To: <bug-followup@FreeBSD.org>, <dteske@FreeBSD.org>
Cc:  
Subject: Re: bin/171725: awk(1) does not support word-boundary metacharacters
Date: Fri, 12 Oct 2012 09:18:34 -0700

 Swapping \< and \> (GNU syntax) for [[:<:]] and [[:>:]] (POSIX syntax) resp=
 ectively.
 
 This makes no difference as one-true-awk uses its own regular expression co=
 de (read: does not use libc which already supports [[:</>:]] word-bounding).
 --=20
 Devin
 
 _____________
 The information contained in this message is proprietary and/or confidentia=
 l. If you are not the intended recipient, please: (i) delete the message an=
 d all copies; (ii) do not disclose, distribute or use the message in any ma=
 nner; and (iii) notify the sender immediately. In addition, please be aware=
  that any message addressed to our domain is subject to archiving and revie=
 w by persons other than the intended recipient. Thank you.

From: Devin Teske <devin.teske@fisglobal.com>
To: <bug-followup@FreeBSD.org>, <dteske@FreeBSD.org>
Cc:  
Subject: Re: bin/171725: awk(1) does not support word-boundary metacharacters
Date: Fri, 12 Oct 2012 09:24:00 -0700

 Enhancing awk(1), we should add support for "\<" and "\>" (not "[[:<:]]" an=
 d "[[:>:]]"). The reason we should support the former syntax is due to the =
 awk(1) manual which says (rather sparsely) "regular expressions are as-in e=
 grep(1)".
 
 If we are to rectify the functionality-gap between the awk(1) manual and aw=
 k(1) functionality, we should either fix the awk(1) manual (for example, to=
  say "regular expressions are as-in egrep(1) except for \< and \>") or we s=
 hould (preferred) add support for \< and \> so that the manual becomes accu=
 rate without modification.
 
 That is to say, that awk(1) should _not_ be enhanced to support [[:<:]] and=
  [[:>:]] as this would only further add to the functionality-gap between wh=
 at is documented and what is expected.
 --=20
 Devin
 
 _____________
 The information contained in this message is proprietary and/or confidentia=
 l. If you are not the intended recipient, please: (i) delete the message an=
 d all copies; (ii) do not disclose, distribute or use the message in any ma=
 nner; and (iii) notify the sender immediately. In addition, please be aware=
  that any message addressed to our domain is subject to archiving and revie=
 w by persons other than the intended recipient. Thank you.
>Unformatted:
