From lists@eitanadler.com  Sat Apr 28 20:32:14 2012
Return-Path: <lists@eitanadler.com>
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 1A9DD106564A
	for <bug-followup@freebsd.org>; Sat, 28 Apr 2012 20:32:14 +0000 (UTC)
	(envelope-from lists@eitanadler.com)
Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com [209.85.212.178])
	by mx1.freebsd.org (Postfix) with ESMTP id 8B62B8FC12
	for <bug-followup@freebsd.org>; Sat, 28 Apr 2012 20:32:13 +0000 (UTC)
Received: by wibhq7 with SMTP id hq7so1349151wib.13
        for <bug-followup@freebsd.org>; Sat, 28 Apr 2012 13:32:12 -0700 (PDT)
Received: by 10.180.88.169 with SMTP id bh9mr7771291wib.5.1335645132218; Sat,
 28 Apr 2012 13:32:12 -0700 (PDT)
Received: by 10.223.73.195 with HTTP; Sat, 28 Apr 2012 13:31:42 -0700 (PDT)
Message-Id: <CAF6rxgm2BiVchKCD0tmzNSp47AAsJ6L8Fded8K5g4eODaPH82A@mail.gmail.com>
Date: Sat, 28 Apr 2012 16:31:42 -0400
From: Eitan Adler <eadler@freebsd.org>
Sender: lists@eitanadler.com
To: bug-followup@freebsd.org
In-Reply-To: <20120314192007.S2163@besplex.bde.org>
Subject: Fwd: kern/165970: [libc] [patch] strtonum() optimization
References: <201203140318.q2E3IBJ6072350@freefall.freebsd.org> <20120314192007.S2163@besplex.bde.org>

>Number:         167399
>Category:       kern
>Synopsis:       Fwd: kern/165970: [libc] [patch] strtonum() optimization
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    gavin
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Apr 28 20:40:07 UTC 2012
>Closed-Date:    Sat Apr 28 22:54:03 UTC 2012
>Last-Modified:  Sat Apr 28 22:54:03 UTC 2012
>Originator:     
>Release:        
>Organization:
>Environment:
>Description:
 ---------- Forwarded message ----------
 From: Bruce Evans <brde@optusnet.com.au>
 Date: 14 March 2012 05:55
 Subject: Re: kern/165970: [libc] [patch] strtonum() optimization
 To: eadler@freebsd.org
 
 
 > Responsible-Changed-From-To: freebsd-bugs->eadler
 > Responsible-Changed-By: eadler
 > Responsible-Changed-When: Wed Mar 14 03:18:10 UTC 2012
 > Responsible-Changed-Why:
 > This PR looking interesting. I have to do some benchmarking to ensure
 > this code is faster && some testing to ensure it is still correct. =C2=A0=
 I
 > may not get to this soon, but I'll try to work on it
 
 
 I have a low opinion of strtonum() and might have pointed out its bugs
 before. =C2=A0There is another PR about the brokenness of atoi() and how
 replacements for it can be even worse, by providing non-replacements
 that are even harder to use. =C2=A0strtonum() is basically a step sideways
 for this. =C2=A0It tries to be easier to use than strtol(), but error handl=
 ing
 for it is still painful, and it has a bad API that was fixed for strtol()
 by expanding to to strto[u]imax(). =C2=A0strtonum() doesn't even have an
 unsigned version, and steals the generic name 'num' for its very
 non-generic API. =C2=A0The generic version should also expand on strtod(),
 but now keeping it easy to use is even harder.
 
 The above was only about its design errors. =C2=A0Its implementation errors
 are a little smaller.
 
 From the PR:
 
 % >Description:
 % % The strtonum() function can be optimized:
 % % 1. Array with messages and errno values should not be created on
 % =C2=A0 =C2=A0each strtonum() call.
 
 Anyone who programs this pessimization should be rewarded with lots
 of programming in K&R where it wasn't possible. =C2=A0Compilers should
 optimize it better, but I haven't seen one doing it perfectly. =C2=A0Most
 like to make a static copy of the data like the programmer should
 have written originally and then copy this to the stack on every call.
 Perhaps declaring it as const prevents the copying. =C2=A0But IIRC,
 strtonum() wants to modify the data as part of its dubious error
 handling, so the data can't be const.
 
 % % 2. If the errno value is not changed, then it can be cached, instead
 % =C2=A0 =C2=A0of calling corresponding function each time to get its value=
 .
 % % 3. Some code paths can be made a bit faster.
 
 Possibly.
 
 % % Also if implementation guaranties that strtoll() does not change errno
 % if successful, then comparisons for LLONG_MAX and LLONG_MIN can be remove=
 d.
 
 Such comparisons don't even work, since LLONG_MAX and LLONG_MIN are in-band=
 .
 The errno method must be used. =C2=A0It us not just an optimizations.
 
 % --- strtonum.c.orig =C2=A0 2006-03-14 21:53:03.000000000 +0200
 % +++ strtonum.c =C2=A0 =C2=A0 =C2=A0 =C2=A02012-03-12 13:05:40.000000000 +=
 0200
 % @@ -24,45 +24,51 @@ __FBSDID("$FreeBSD: src/lib/libc/stdlib/
 % =C2=A0#include <limits.h>
 % =C2=A0#include <stdlib.h>
 % % -#define INVALID =C2=A0 =C2=A01
 % -#define TOOSMALL =C2=A0 =C2=A0 2
 % -#define TOOLARGE =C2=A0 =C2=A0 3
 % +#define INVALID =C2=A0 =C2=A0 =C2=A00
 % +#define TOOSMALL =C2=A0 =C2=A0 1
 % +#define TOOLARGE =C2=A0 =C2=A0 2
 
 Of course code of this quality misformats the tab after @define.
 
 % % =C2=A0long long
 % =C2=A0strtonum(const char *numstr, long long minval, long long maxval,
 % =C2=A0 =C2=A0 =C2=A0const char **errstrp)
 % =C2=A0{
 % - =C2=A0 =C2=A0 long long ll =3D 0;
 % - =C2=A0 =C2=A0 char *ep;
 % - =C2=A0 =C2=A0 int error =3D 0;
 % - =C2=A0 =C2=A0 struct errval {
 % - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 const char *errstr;
 % - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 int err;
 % - =C2=A0 =C2=A0 } ev[4] =3D {
 % - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 { NULL, =C2=A0 =C2=A0 =C2=A0 =
 =C2=A0 0 },
 % - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 { "invalid", =C2=A0 =C2=A0EIN=
 VAL },
 % - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 { "too small", =C2=A0ERANGE }=
 ,
 % - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 { "too large", =C2=A0ERANGE }=
 ,
 % + =C2=A0 =C2=A0 static const struct {
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 const char =C2=A0 =C2=A0 =C2=
 =A0*errstr;
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 int =C2=A0 =C2=A0 =C2=A0 =C2=
 =A0 =C2=A0 =C2=A0 errnum;
 % + =C2=A0 =C2=A0 } ev[] =3D {
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [INVALID] =C2=A0 =C2=A0 =C2=
 =A0 =3D { "invalid", =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0EINVAL },
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [TOOSMALL] =C2=A0 =C2=A0 =C2=
 =A0=3D { "too small", =C2=A0 =C2=A0 =C2=A0 =C2=A0ERANGE },
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [TOOLARGE] =C2=A0 =C2=A0 =C2=
 =A0=3D { "too large", =C2=A0 =C2=A0 =C2=A0 =C2=A0ERANGE }
 % =C2=A0 =C2=A0 =C2=A0 };
 
 Good not try to get the pessimization of copying on evey call.
 
 This also improves the formatting of the structs, but I think the whole
 implementation of the error handling is worse than its design. =C2=A0In
 K&R, you couldn't use the [] initialization, and it is overkill for
 a small array. =C2=A0It is most useful for unstructured structs.
 
 % % - =C2=A0 ev[0].err =3D errno;
 % - =C2=A0 =C2=A0 errno =3D 0;
 % + =C2=A0 =C2=A0 int error;
 % +
 % =C2=A0 =C2=A0 =C2=A0 if (minval > maxval)
 % =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 error =3D INVALID;
 % =C2=A0 =C2=A0 =C2=A0 else {
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 long long ll;
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 char *ep;
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 int errno_save;
 % +
 
 This adds some style bugs (nested declarations).
 
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 errno_save =3D errno;
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 errno =3D 0;
 % =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ll =3D strtoll(numstr, &=
 ep, 10);
 
 The largest design bug is not the limitation to long longs, but the
 limitation to base 10. =C2=A0This limitation saves passing just the base
 parameter of strtol(). =C2=A0It's good to simplify like that, but a
 generic function should accept any base.
 
 % - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (errno =3D=3D EINVAL || nu=
 mstr =3D=3D ep || *ep !=3D '\0')
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 error =3D errno;
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (error =3D=3D EINVAL || nu=
 mstr =3D=3D ep || *ep !=3D '\0')
 % =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
 =A0 error =3D INVALID;
 
 I don't like using the POSIX (?) extension of strto*() returning EINVAL,
 but if we do then we shouldn't check for EINVAL. =C2=A0I think numstr =3D=
 =3D ep
 is unreachable since conditions that cause it also cause error =3D=3D EINVA=
 L.
 I prefer to omit the error =3D=3D EINVAL check instead.
 
 % - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 else if ((ll =3D=3D LLONG_MIN=
  && errno =3D=3D ERANGE) || ll < minval)
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 else if ((ll =3D=3D LLONG_MIN=
  && error =3D=3D ERANGE) || ll < minval)
 % =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
 =A0 error =3D TOOSMALL;
 % - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 else if ((ll =3D=3D LLONG_MAX=
  && errno =3D=3D ERANGE) || ll > maxval)
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 else if ((ll =3D=3D LLONG_MAX=
  && error =3D=3D ERANGE) || ll > maxval)
 % =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
 =A0 error =3D TOOLARGE;
 
 Unfortunately there is a problem here. =C2=A0We want to distinguish TOOSMAL=
 L from
 TOOLARGE, but ERANGE only tells us that one of these occurred. =C2=A0So che=
 cking
 the limits seems to be necessary.
 
 If anyone cared about optimizing this, then they would check error =3D=3D E=
 RANGE
 before checking ll, since error is smaller and usually !=3D ERANGE.
 
 I keep mistyping `error' as errno... =C2=A0Copying errno is good, but why
 doesn't the compiler cache it for you? =C2=A0It is per-thread, so it is
 even less volatile than a general global. =C2=A0Maybe its declaration is
 missing a __pure2. =C2=A0I looked at it, and it indeed doesn't have a
 __pure2. =C2=A0I was surprised to find that it was a function call even in
 old versions of FreeBSD (I thought it was ifdefed to just a global in
 the non-threaded case). =C2=A0After adding a __pure2, the result of the
 function call is cached. =C2=A0But the accesses through the pointer are not
 cached by gcc-3.3. =C2=A0Neither are the stores for 'x =3D errno; x =3D err=
 no;'.
 x is global, and there is no volatile in sight. =C2=A0But gcc-4.2 reduces
 this to just the function call to obtain the pointer, then a single
 load-store.
 
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 else {
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 e=
 rrno =3D errno_save;
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 i=
 f (errstrp !=3D NULL)
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 *errstrp =3D NULL;
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 r=
 eturn (ll);
 % + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }
 % =C2=A0 =C2=A0 =C2=A0 }
 % =C2=A0 =C2=A0 =C2=A0 if (errstrp !=3D NULL)
 % =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 *errstrp =3D ev[error].e=
 rrstr;
 % - =C2=A0 =C2=A0 errno =3D ev[error].err;
 % - =C2=A0 =C2=A0 if (error)
 % - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ll =3D 0;
 % -
 % - =C2=A0 =C2=A0 return (ll);
 % + =C2=A0 =C2=A0 errno =3D ev[error].errnum;
 % + =C2=A0 =C2=A0 return (0);
 % =C2=A0}
 
 Mostly better. =C2=A0The old version does something confusing with the non-=
 const
 data structure, to do little more than avoid an else clause.
 
 I tried writing this without any data structure or internal error codes,
 but didn't like the results. =C2=A0It gave lots of large else clauses, wher=
 e
 the above maps all error cases to an error code and then looks up this
 code in the table, which is certainly better for a large number of cases.
 So both the original and the above are looking better -- the orginal
 goes too far in mapping the non-error case to the table lookup, and the
 above fixes that.
 
 I would have designed the API so that it doesn't try to be more explicit
 than standard strto*() in its description of errors. =C2=A0It should just
 return errno =3D ERANGE (or maybe EINVAL) and not add the complication
 of an only slightly more descriptive string error message. =C2=A0I just
 remembered that the latter was one of the main points discussed in the
 old PR. =C2=A0Even moldy programs that use atoi() often have context-sensit=
 ive
 error messages that give more useful detail than a generic function can.
 ... Just found the old PR -- it is 142911. =C2=A0One example from it:
 
 @ @@ -226,9 +223,10 @@
 @ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
 =A0 break;
 @ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 case 'n':
 @ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
 =A0 nflag =3D 1;
 @ - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 m=
 axshowdevs =3D (int)strtonum(optarg, 0, 1000, &errstr);
 @ - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 i=
 f (errstr !=3D NULL)
 @ - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 errx(1, "-n %s: %s", optarg, errstr);
 @ + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
 =C2=A0 maxshowdevs =3D atoi(optarg);
 @ + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 i=
 f (maxshowdevs < 0)
 @ + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 errx(1, "number of devices %d is < 0",
 @ + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0maxshowdevs);
 @ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
 =A0 break;
 
 Note that the patch is backwards. =C2=A0This change, proposed by the submit=
 ter,
 shows how unusable strtonum() and especially its error string are. =C2=A0We=
  had
 to do extra work to call it. =C2=A0It gives better error checking. =C2=A0Bu=
 t then
 using its error string is hard. =C2=A0Well, not very hard, but the submitte=
 d
 just printed it, which replaces "number of devices" by the cryptic -n.
 The correct translation seems to be
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
 =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0errx(1, "number of devices %s is %s",
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
 =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0optarg, errstr);
 (The %s for optarg should also be quoted.).
 
 Bruce
 
 
 
 --=20
 Eitan Adler
 Source & Ports committer
 X11, Bugbusting teams
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->closed 
State-Changed-By: gavin 
State-Changed-When: Sat Apr 28 22:53:26 UTC 2012 
State-Changed-Why:  
Misfiled followup to kern/165970, content migrated 


Responsible-Changed-From-To: gnats-admin->gavin 
Responsible-Changed-By: gavin 
Responsible-Changed-When: Sat Apr 28 22:53:26 UTC 2012 
Responsible-Changed-Why:  


http://www.freebsd.org/cgi/query-pr.cgi?pr=167399 
>Unformatted:
