From nobody@FreeBSD.org  Wed Dec 16 14:19:54 2009
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 54FFA1065676
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 16 Dec 2009 14:19:54 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 4480D8FC24
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 16 Dec 2009 14:19:54 +0000 (UTC)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id nBGEJrsv029074
	for <freebsd-gnats-submit@FreeBSD.org>; Wed, 16 Dec 2009 14:19:53 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id nBGEJr7T029073;
	Wed, 16 Dec 2009 14:19:53 GMT
	(envelope-from nobody)
Message-Id: <200912161419.nBGEJr7T029073@www.freebsd.org>
Date: Wed, 16 Dec 2009 14:19:53 GMT
From: Maxim Zakharov <maxime@maxime.net.ru>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Fester version of stncpy function
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         141682
>Category:       kern
>Synopsis:       [libc] [patch] Faster version of strncpy(3)
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    eadler
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Wed Dec 16 14:20:01 UTC 2009
>Closed-Date:    Thu May 10 18:21:31 UTC 2012
>Last-Modified:  Thu May 10 18:21:52 UTC 2012
>Originator:     Maxim Zakharov
>Release:        7.1-BETA2
>Organization:
>Environment:
FreeBSD max.maxime.net.ru 7.1-BETA2 FreeBSD 7.1-BETA2 #0: Mon Oct 13 04:23:28 UTC 2008     root@logan.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  i386
>Description:
I would like to propose a faster version of strncpy function
(/usr/src/lib/libc/string/strncpy.c).  It's about 30% faster on aligned
data, and about two times faster on unaligned data on modern pipelined
processors (Intel Duo E8400 3MHz in particular).
>How-To-Repeat:

>Fix:
void * dps_strncpy(char *dst0, char *src0, size_t length) {
  if (length) {
    register size_t n = (length + 7) / 8;
    register size_t r = (length % 8);
    register char *dst = dst0, *src = src0;
    if (r == 0) r = 8;
    if (!(dst[0] = src[0])) return dst0;
    if (r > 1) { if (!(dst[1] = src[1])) return dst0;
    if (r > 2) { if (!(dst[2] = src[2])) return dst0;
    if (r > 3) { if (!(dst[3] = src[3])) return dst0;
    if (r > 4) { if (!(dst[4] = src[4])) return dst0;
    if (r > 5) { if (!(dst[5] = src[5])) return dst0;
    if (r > 6) { if (!(dst[6] = src[6])) return dst0;
    if (r > 7) { if (!(dst[7] = src[7])) return dst0;
    }}}}}}}
    src += r; dst += r;
    while (--n > 0) {
      if (!(dst[0] = src[0])) break;
      if (!(dst[1] = src[1])) break;
      if (!(dst[2] = src[2])) break;
      if (!(dst[3] = src[3])) break;
      if (!(dst[4] = src[4])) break;
      if (!(dst[5] = src[5])) break;
      if (!(dst[6] = src[6])) break;
      if (!(dst[7] = src[7])) break;
      src += 8; dst += 8;
    }
    if (dst < dst0 + length) *dst = '\0';
  }
  return dst0;
}


>Release-Note:
>Audit-Trail:

From: Maxim Zakharov <maxime@maxime.net.ru>
To: bug-followup@FreeBSD.org, maxime@maxime.net.ru
Cc:  
Subject: Re: kern/141682: [libc] [patch] Faster version of strncpy(3)
Date: Thu, 17 Dec 2009 23:46:03 +0300

 Sorry, it seems one line was redundant:
 
 void * dps_strncpy(char *dst0, char *src0, size_t length) {
   if (length) {
     register size_t n = (length + 7) / 8;
     register size_t r = (length % 8);
     register char *dst = dst0, *src = src0;
     if (r == 0) r = 8;
     if (!(dst[0] = src[0])) return dst0;
     if (r > 1) if (!(dst[1] = src[1])) return dst0;
     if (r > 2) if (!(dst[2] = src[2])) return dst0;
     if (r > 3) if (!(dst[3] = src[3])) return dst0;
     if (r > 4) if (!(dst[4] = src[4])) return dst0;
     if (r > 5) if (!(dst[5] = src[5])) return dst0;
     if (r > 6) if (!(dst[6] = src[6])) return dst0;
     if (r > 7) if (!(dst[7] = src[7])) return dst0;
     src += r; dst += r;
     while (--n > 0) {
       if (!(dst[0] = src[0])) break;
       if (!(dst[1] = src[1])) break;
       if (!(dst[2] = src[2])) break;
       if (!(dst[3] = src[3])) break;
       if (!(dst[4] = src[4])) break;
       if (!(dst[5] = src[5])) break;
       if (!(dst[6] = src[6])) break;
       if (!(dst[7] = src[7])) break;
       src += 8; dst += 8;
     }
   }
   return dst0;
 }
 
 
 -- 
 http://www.dataparksearch.org/

From: Jaakko Heinonen <jh@FreeBSD.org>
To: Maxim Zakharov <maxime@maxime.net.ru>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/141682: [libc] [patch] Faster version of strncpy(3)
Date: Fri, 15 Jan 2010 21:12:29 +0200

 Hi,
 
 On 2009-12-17, Maxim Zakharov wrote:
 >  void * dps_strncpy(char *dst0, char *src0, size_t length) {
 >    if (length) {
 >      register size_t n = (length + 7) / 8;
 
 This won't work with length values larger than SIZE_MAX - 7 due to
 integer overflow.
 
 -- 
 Jaakko

From: Maxim Zakharov <maxime@maxime.net.ru>
To: Jaakko Heinonen <jh@freebsd.org>
Cc: bug-followup@freebsd.org
Subject: Re: kern/141682: [libc] [patch] Faster version of strncpy(3)
Date: Sat, 16 Jan 2010 01:26:03 +0300

 Hi,
 
     if (length) {
       register size_t n = length / 8;
       register size_t r = (length % 8);
       register char *dst = dst0, *src = src0;
       if (r == 0) r = 8; else n++;
 
 this solves the problem.
 
 Thank you.
 
 On 1/15/10, Jaakko Heinonen <jh@freebsd.org> wrote:
 >
 > Hi,
 >
 > On 2009-12-17, Maxim Zakharov wrote:
 >>  void * dps_strncpy(char *dst0, char *src0, size_t length) {
 >>    if (length) {
 >>      register size_t n = (length + 7) / 8;
 >
 > This won't work with length values larger than SIZE_MAX - 7 due to
 > integer overflow.
 >
 > --
 > Jaakko
 >
 
 
 -- 
 http://www.dataparksearch.org/

From: Maxim Zakharov <maxime@maxime.net.ru>
To: Jim White <spamchannel@gmail.com>
Cc: bug-followup@freebsd.org
Subject: Re: kern/141682: [libc] [patch] Faster version of strncpy(3)
Date: Tue, 9 Feb 2010 01:28:19 +0300

 Hello Jim,
 
 Thank you for pointing out the problem with padding by 0's. The
 version below has this issue fixed, but it's still faster than
 standard version if the code is compiled with optimization on modern
 processor.
 
 typedef long long word;  /* up to 32 bytes long */
 #define wsize sizeof(word)
 #define wmask (wsize - 1)
 
 inline void dps_minibzero(char *dst, size_t t) {
 	if (t) { dst[0] = '\0';
 	if (t > 1) { dst[1] = '\0';
 	if (t > 2) { dst[2] = '\0';
 	if (t > 3) { dst[3] = '\0';
 	if (t > 4) { dst[4] = '\0';
 	if (t > 5) { dst[5] = '\0';
 	if (t > 6) { dst[6] = '\0';
 	if (t > 7) { dst[7] = '\0';
 	if (t > 8) { dst[8] = '\0';
 	if (t > 9) { dst[9] = '\0';
 	if (t > 10) { dst[10] = '\0';
 	if (t > 11) { dst[11] = '\0';
 	if (t > 12) { dst[12] = '\0';
 	if (t > 13) { dst[13] = '\0';
 	if (t > 14) { dst[14] = '\0';
 	if (t > 15) { dst[15] = '\0';
 	if (t > 16) { dst[16] = '\0';
 	if (t > 17) { dst[17] = '\0';
 	if (t > 18) { dst[18] = '\0';
 	if (t > 19) { dst[19] = '\0';
 	if (t > 20) { dst[20] = '\0';
 	if (t > 21) { dst[21] = '\0';
 	if (t > 22) { dst[22] = '\0';
 	if (t > 23) { dst[23] = '\0';
 	if (t > 24) { dst[24] = '\0';
 	if (t > 25) { dst[25] = '\0';
 	if (t > 26) { dst[26] = '\0';
 	if (t > 27) { dst[27] = '\0';
 	if (t > 28) { dst[28] = '\0';
 	if (t > 29) { dst[29] = '\0';
 	if (t > 30) { dst[30] = '\0';
 	}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
 }
 
 
 void * dps_strncpy(char *dst0, char *src0, size_t length) {
   if (length) {
     register size_t n = length / 8;
     register size_t r = (length % 8);
     register char *dst = dst0, *src = src0;
     if (r == 0) r = 8; else n++;
     if (!(dst[0] = src[0])) { dst++; src++; goto dps_strncpy_second_pas; }
     if (r > 1) { if (!(dst[1] = src[1])) { dst += 2; src += 2; goto
 dps_strncpy_second_pas; }
     if (r > 2) { if (!(dst[2] = src[2])) { dst += 3; src += 3; goto
 dps_strncpy_second_pas; }
     if (r > 3) { if (!(dst[3] = src[3])) { dst += 4; src += 4; goto
 dps_strncpy_second_pas; }
     if (r > 4) { if (!(dst[4] = src[4])) { dst += 5; src += 5; goto
 dps_strncpy_second_pas; }
     if (r > 5) { if (!(dst[5] = src[5])) { dst += 6; src += 6; goto
 dps_strncpy_second_pas; }
     if (r > 6) { if (!(dst[6] = src[6])) { dst += 7; src += 7; goto
 dps_strncpy_second_pas; }
     if (r > 7) { if (!(dst[7] = src[7])) { dst += 8; src += 8; goto
 dps_strncpy_second_pas; }
     }}}}}}}
     src += r; dst += r;
     while (--n > 0) {
       if (!(dst[0] = src[0])) { dst++; src++; goto dps_strncpy_second_pas; }
       if (!(dst[1] = src[1])) { dst += 2; src += 2; goto
 dps_strncpy_second_pas; }
       if (!(dst[2] = src[2])) { dst += 3; src += 3; goto
 dps_strncpy_second_pas; }
       if (!(dst[3] = src[3])) { dst += 4; src += 4; goto
 dps_strncpy_second_pas; }
       if (!(dst[4] = src[4])) { dst += 5; src += 5; goto
 dps_strncpy_second_pas; }
       if (!(dst[5] = src[5])) { dst += 6; src += 6; goto
 dps_strncpy_second_pas; }
       if (!(dst[6] = src[6])) { dst += 7; src += 7; goto
 dps_strncpy_second_pas; }
       if (!(dst[7] = src[7])) { dst += 8; src += 8; goto
 dps_strncpy_second_pas; }
       src += 8; dst += 8;
     }
 dps_strncpy_second_pas:
     if (dst < dst0 + length) {
       size_t t, restlen = length + dst0 - dst;
       t = (unsigned int)dst & wmask;
       if (t) {
     	if (restlen < wsize) {
 		t = restlen;
     	} else {
 		t = wsize - t;
     	}
 	bzero(dst, t);
 	dps_minibzero(dst, t);
 	restlen -= t;
 	dst += t;
       }
       t = restlen / wsize;
       if (t) {
 	n = t / 8;
     	r = (t % 8);
 	register word *wdst = (word*)dst;
     	if (r == 0) r = 8; else n++;
     	wdst[0] = (word)0;
     	if (r > 1) { wdst[1] = (word)0;
     	if (r > 2) { wdst[2] = (word)0;
     	if (r > 3) { wdst[3] = (word)0;
     	if (r > 4) { wdst[4] = (word)0;
     	if (r > 5) { wdst[5] = (word)0;
     	if (r > 6) { wdst[6] = (word)0;
     	if (r > 7) { wdst[7] = (word)0;
 	}}}}}}}
     	wdst += r;
     	while (--n > 0) {
     		wdst[0] = (word)0;
     		wdst[1] = (word)0;
     		wdst[2] = (word)0;
     		wdst[3] = (word)0;
     		wdst[4] = (word)0;
     		wdst[5] = (word)0;
     		wdst[6] = (word)0;
     		wdst[7] = (word)0;
     		wdst += 8;
     	}
  	dst = (char*)wdst;
       }
       if ( (t = (restlen & wmask)) ) dps_minibzero(dst, t);
     }
   }
   return dst0;
 }
 
 
 On 2/8/10, Jim White <spamchannel@gmail.com> wrote:
 > The replacement strncpy you propose in PR 141682 (copied below), is not a
 > valid replacement for strncpy.  The strncpy function always writes exactly
 > length bytes to dst0 (padding with 0 bytes after the first 0 byte of src0 is
 > found).  The below function does not do this.  A quick example program to
 > demonstrate follows:
 >
 >
 > void * dps_strncpy(char *dst0, char *src0, size_t length) {
 > if (length) {
 > register size_t n = (length + 7) / 8;
 > register size_t r = (length % 8);
 > register char *dst = dst0, *src = src0;
 > if (r == 0) r = 8;
 > if (!(dst[0] = src[0])) return dst0;
 > if (r > 1) if (!(dst[1] = src[1])) return dst0;
 > if (r > 2) if (!(dst[2] = src[2])) return dst0;
 > if (r > 3) if (!(dst[3] = src[3])) return dst0;
 > if (r > 4) if (!(dst[4] = src[4])) return dst0;
 > if (r > 5) if (!(dst[5] = src[5])) return dst0;
 > if (r > 6) if (!(dst[6] = src[6])) return dst0;
 > if (r > 7) if (!(dst[7] = src[7])) return dst0;
 > src += r; dst += r;
 > while (--n > 0) {
 > if (!(dst[0] = src[0])) break;
 > if (!(dst[1] = src[1])) break;
 > if (!(dst[2] = src[2])) break;
 > if (!(dst[3] = src[3])) break;
 > if (!(dst[4] = src[4])) break;
 > if (!(dst[5] = src[5])) break;
 > if (!(dst[6] = src[6])) break;
 > if (!(dst[7] = src[7])) break;
 > src += 8; dst += 8;
 > }
 > }
 > return dst0;
 > }
 >
 > int main()
 > {
 > char buf[]="01234567890123456789012345678901234567890";
 > dps_strncpy(buf,"abcdef",sizeof buf);
 > fwrite(buf,1,sizeof buf,stdout);
 > strncpy(buf,"abcdef",sizeof buf);
 > puts("");
 > fwrite(buf,1,sizeof buf,stdout);
 > return 0;
 > }
 >
 >
 > The output of this is:
 >
 > ./a.out
 > abcdef7890123456789012345678901234567890
 > abcdef
 >
 
 
 -- 
 http://www.dataparksearch.org/
State-Changed-From-To: open->closed 
State-Changed-By: eadler 
State-Changed-When: Thu May 10 18:21:29 UTC 2012 
State-Changed-Why:  
This needs statistically valid benchmarking. duff's device is known to 
be non-optimal on modern archs 

http://www.freebsd.org/cgi/query-pr.cgi?pr=141682 
Responsible-Changed-From-To: freebsd-bugs->eadler 
Responsible-Changed-By: eadler 
Responsible-Changed-When: Thu May 10 18:21:51 UTC 2012 
Responsible-Changed-Why:  
I closed it. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=141682 
>Unformatted:
