From nobody@FreeBSD.org  Sun Jul 18 23:07:46 2004
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A1F5B16A4CE
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 18 Jul 2004 23:07:46 +0000 (GMT)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 99E5843D5F
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 18 Jul 2004 23:07:46 +0000 (GMT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.12.11/8.12.11) with ESMTP id i6IN7kpq063442
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 18 Jul 2004 23:07:46 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.12.11/8.12.11/Submit) id i6IN7k2V063441;
	Sun, 18 Jul 2004 23:07:46 GMT
	(envelope-from nobody)
Message-Id: <200407182307.i6IN7k2V063441@www.freebsd.org>
Date: Sun, 18 Jul 2004 23:07:46 GMT
From: Qing Li <qing.li@bluecoat.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: in_cksum_hdr is non-functional without -O compiler flag
X-Send-Pr-Version: www-2.3

>Number:         69257
>Category:       i386
>Synopsis:       [i386] [patch] in_cksum_hdr is non-functional without -O compiler flag
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    bz
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Jul 18 23:10:10 GMT 2004
>Closed-Date:    Sun Oct 21 17:38:00 UTC 2007
>Last-Modified:  Fri Oct 26 07:20:03 UTC 2007
>Originator:     Qing Li
>Release:        5.2.1
>Organization:
Blue Coat Systems, Inc.
>Environment:
FreeBSD heavygear.bluecoat.com 5.2.1-RELEASE FreeBSD 5.2.1-RELEASE #0: Tue Jun8 20:16:31 GMT 2004    root@heavygear.bluecoat.com:/usr/src/sys/i386/compile/QING i386

>Description:
The "in_cksum_hdr" function does not work when compiled without any optimization flags on i386. 

	The original inline __asm code was written as

#define ADD(n)	__asm __volatile ("addl %1, %0" : "+r" (sum) : \
    "g" (((const u_int32_t *)ip)[n]))
#define ADDC(n)	__asm __volatile ("adcl %1, %0" : "+r" (sum) : \
    "g" (((const u_int32_t *)ip)[n]))

      ADD(0)
      ADDC(4)
      ADDC(8)  etc.

The C code assumes that the carry bit is always kept from the
previous operation. However, the pointer indexing requires another
add operation, here is the gcc generated code

#1    3784 2ef0 0310     addl (%eax), %edx
      3786 2ef2 8955FC   movl	%edx, -4(%ebp)
      3787 2ef5 8B4508 	 movl	8(%ebp), %eax
#2    3788 2ef8 83C004   addl	$4, %eax
      3789 2efb 8B55FC 	 movl	-4(%ebp), %edx
#3    3791 2efe 1310 	 adcl (%eax), %edx

The bug is, the carry bit from #1 is tromped over by the
"addl" operation on line #2, so the "adcl" on #3 has no effect
because the carry bit is cleard by #2. The result is checksum
failure on received packets.

>How-To-Repeat:
Compile any file that calls the in_cksum_hdr function, such as
ip_input.c, without any optimization flags. 
>Fix:
      
>Release-Note:
>Audit-Trail:

From: Mike Bristow <mike@urgle.com>
To: freebsd-gnats-submit@FreeBSD.org, qing.li@bluecoat.com
Cc:  
Subject: Re: i386/69257: in_cksum_hdr is non-functional without -O compiler
	flag
Date: Wed, 25 Aug 2004 10:33:37 +0100

 The patch below (might) fix this issue - I haven't yet built it into a
 kernel without -O to see.  However, with a small test program, it
 generates identical (if we ignore whitespace) asm with -O, and what
 appears to different-but-correct assembler without.
 
 I do not believe that there is any other way of preventing the compiler
 inserting arbitrary instructions between different __asm statements (and
 that the commit message in revision 1.13 of in_cksum.h is now wrong on
 this point).  From
 http://developer.apple.com/documentation/DeveloperTools/gcc-3.3/gcc/Extended-Asm.html 
 ---8<---8<---8<---
 You can't expect a sequence of volatile asm instructions to remain
 perfectly consecutive. If you want consecutive output, use a single
 asm.  Also, GCC will perform some optimizations across a volatile asm
 instruction; GCC does not "forget everything" when it encounters a
 volatile asm instruction the way some other compilers do.
 ---8<---8<---8<---
 
 Obviously, this may not be true anymore for gcc 3.4 - except that's the
 behaviour we are seeing, so I guess it is.
 
 
 --- sys/i386/include/in_cksum.h.orig    Wed Apr  7 21:46:05 2004
 +++ sys/i386/include/in_cksum.h Wed Aug 25 09:45:39 2004
 @@ -55,22 +55,20 @@
  {
         register u_int sum = 0;
 
 -/* __volatile is necessary here because the condition codes are used.
 */
 -#define ADD(n) __asm __volatile ("addl %1, %0" : "+r" (sum) : \
 -    "g" (((const u_int32_t *)ip)[n / 4]))
 -#define ADDC(n)        __asm __volatile ("adcl %1, %0" : "+r" (sum) : \
 -    "g" (((const u_int32_t *)ip)[n / 4]))
 -#define MOP    __asm __volatile ("adcl $0, %0" : "+r" (sum))
 -
 -       ADD(0);
 -       ADDC(4);
 -       ADDC(8);
 -       ADDC(12);
 -       ADDC(16);
 -       MOP;
 -#undef ADD
 -#undef ADDC
 -#undef MOP
 +       __asm __volatile (
 +               "addl %1, %0\n"
 +               "adcl %2, %0\n"
 +               "adcl %3, %0\n"
 +               "adcl %4, %0\n"
 +               "adcl %5, %0\n"
 +               "adcl $0, %0"
 +               : "+r" (sum)
 +               : "g" (((const u_int32_t *)ip)[0]),
 +                 "g" (((const u_int32_t *)ip)[1]),
 +                 "g" (((const u_int32_t *)ip)[2]),
 +                 "g" (((const u_int32_t *)ip)[3]),
 +                 "g" (((const u_int32_t *)ip)[4])
 +       );
         sum = (sum & 0xffff) + (sum >> 16);
         if (sum > 0xffff)
                 sum -= 0xffff;
 
State-Changed-From-To: open->feedback 
State-Changed-By: remko 
State-Changed-When: Mon Sep 11 13:04:48 UTC 2006 
State-Changed-Why:  
Hello, 

Can you tell me whether this got resolved? Is it still there 
in more recent FreeBSD versions like 6.1?? 


Responsible-Changed-From-To: freebsd-i386->remko 
Responsible-Changed-By: remko 
Responsible-Changed-When: Mon Sep 11 13:04:48 UTC 2006 
Responsible-Changed-Why:  
grab the PR 


-----Original Message-----
From: Bruce Evans [mailto:bde@zeta.org.au]
Sent: Tuesday, September 12, 2006 2:10 PM
To: Li, Qing
Subject: RE: i386/69257: [i386] [patch] in_cksum_hdr is non-functional
without -O compiler flag

On Mon, 11 Sep 2006, Li, Qing wrote:

> The last time I checked, the fixes are in but the PR is
> not updated, which reminds me I need to close out some
> of those PRs myself.

There are still some bogus asm sequences (using macros in in_cksum.c and
sequential asms in in_cksum.h). There are also probably responsible for
the pessimizations of the amd64 and/or __INTEL_COMPILER cases (by not
using any asm since the bugs in it were always fatal).

I believe DflyBSD rewrote all in* files because it didn't like the
macros or my old i486 optimizations (the FreeBSD version still mostly
uses these).

NetBSD has everything in in_cksum.S for i386 at least. This makes it
much faster for small packets because the book-keeping dominates for
small packets and the C code for it ends up too large and/or branchy.

However, for large packets, the best code is very MD and there are too
many machines to write it in asm for them all. I once used the
following for i586's:

% Index: in_cksum.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/i386/i386/in_cksum.c,v
% retrieving revision 1.27
% diff -u -2 -r1.27 in_cksum.c
% --- in_cksum.c 7 Apr 2004 20:46:04 -0000 1.27
% +++ in_cksum.c 8 Apr 2004 13:17:28 -0000
% @@ -1,2 +1,5 @@
% +#define I586_OPTIMIZED_CKSUM_NOT
% +#define I586_OPTIMIZED_CKSUM_2_NOT
% +
% /*-
% * Copyright (c) 1990 The Regents of the University of California.

I still have an i586 (P1-133) but haven't turned it on for about 4
years.

% @@ -268,4 +271,208 @@
% ("adcl $0, %0" : "+r" (sum))
%
% +#ifdef I586_OPTIMIZED_CKSUM
% +static int read6(const void *buf, size_t len) % +{
% + unsigned junk1;
% + unsigned junk2;
% + unsigned sum;
% +
% + sum = 0;
% + __asm __volatile(" \n\
% + movl 0(%0),%2 \n\
% + nop \n\
% + .align 4,0x90 \n\
% + 1: \n\
% + movl 32(%0),%3 \n\
% + addl %2,%4 \n\
% + nop \n\
% + movl 4(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 8(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 12(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 16(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 20(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 24(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 28(%0),%2 \n\
% + \n\
% + adcl %2,%4 \n\
% + movl %3,%2 \n\
% + \n\
% + adcl $0,%4 \n\
% + addl $32,%0 \n\
% + subl $32,%1 \n\
% + ja 1b \n\
% + "
% + : "+r" (buf), "+r" (len), "=&r" (junk1), "=&r" (junk2), "+r"
(sum));
% + return sum;
% +}
% +
% +static int readB(const void *buf, size_t len) % +{
% + unsigned junk1;
% + unsigned junk2;
% + unsigned sum;
% +
% + sum = 0;
% + __asm __volatile(" \n\
% + movl 0(%0),%3 \n\
% + nop \n\
% + .align 4,0x90 \n\
% + 1: \n\
% + movl 32(%0),%2 \n\
% + addl %3,%4 \n\
% + adcl %2,%4 \n\
% + movl 4(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 8(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 12(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 16(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 20(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 24(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 28(%0),%2 \n\
% + \n\
% + adcl %2,%4 \n\
% + movl 64(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 36(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 40(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 44(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 48(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 52(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 56(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 60(%0),%2 \n\
% + \n\
% + adcl %2,%4 \n\
% + movl 96(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 68(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 72(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 76(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 80(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 84(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 88(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 92(%0),%2 \n\
% + \n\
% + adcl %2,%4 \n\
% + movl 128(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 100(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 104(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 108(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 112(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 116(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 120(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 124(%0),%2 \n\
% + \n\
% + adcl %2,%4 \n\
% + movl 160(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 132(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 136(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 140(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 144(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 148(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 152(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 156(%0),%2 \n\
% + \n\
% + adcl %2,%4 \n\
% + movl 192(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 164(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 168(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 172(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 176(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 180(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 184(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 188(%0),%2 \n\
% + \n\
% + adcl %2,%4 \n\
% + movl 224(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 196(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 200(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 204(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 208(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 212(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 216(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 220(%0),%2 \n\
% + \n\
% + movl 256(%0),%3 \n\
% + nop \n\
% + \n\
% + adcl %2,%4 \n\
% + movl 228(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 232(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 236(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 240(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 244(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 248(%0),%2 \n\
% + adcl %2,%4 \n\
% + movl 252(%0),%2 \n\
% + \n\
% + adcl %2,%4 \n\
% + nop \n\
% + \n\
% + adcl $0,%4 \n\
% + addl $256,%0 \n\
% + subl $256,%1 \n\
% + ja 1b \n\
% + "
% + : "+r" (buf), "+r" (len), "=&r" (junk1), "=&r" (junk2), "+r"
(sum));
% + return sum;
% +}
% +#endif /* I586_OPTIMIZED_CKSUM */
% +
% u_short
% in_cksum_skip(m, len, skip)

Putting everything in one asm is no harder than using macros. Note that
the instruction order in the above is subtle (it's optimized for the
P1's
pipelines) so using 1 macro for each instruction wouldn't be very useful
(it would make the offsets easier to see, but since the best order is
very MD, one sequence of macros would only be good for one machine).

The above is no better than the i486-optimized version for P2s (and
P3s?) so I stopped using it. It is better on AthlonXPs but worse than
code actually optimized for AthlonXPs. On newer machines, pipelines and
caches work more automatically so large unrolling and subtle ordering as
in the above is rarely useful. Instruction bandwith is still enough of
a bottleneck for using 64-bit accesses to give a noticeable speedup in
the fully- cached case on Athlon64s. However, this function never shows
up in kernel profiles so I only benchmarked this in userland.

% @@ -355,6 +562,51 @@
% * branches &c small.
% */
% +#ifdef I586_OPTIMIZED_CKSUM
% + if (mlen >= 256 + 32 + 4) {
% + unsigned extra;
% + unsigned x;
% +
% + x = (mlen - 32 - 4) & ~0xff;
% + extra = readB(w, x);
% + if (((unsigned long long)sum + extra) &
0x100000000LL)
% + sum = sum + extra + 1;
% + else
% + sum += extra;
% + w += x / 2;
% + mlen -= x;
% + }
% + if (mlen >= 32 + 4) {
% + unsigned extra;
% + unsigned x;
% +
% + x = (mlen - 4) & ~0x1f;
% + extra = read6(w, x);
% + if (((unsigned long long)sum + extra) &
0x100000000LL)
% + sum = sum + extra + 1;
% + else
% + sum += extra;
% + w += x / 2;
% + mlen -= x;
% + }
% +#endif /* I586_OPTIMIZED_CKSUM */

This is not very carefully integrated.

% mlen -= 1;
% while ((mlen -= 32) >= 0) {
% +#ifdef I586_OPTIMIZED_CKSUM_2
% +#define ADDR() __asm __volatile("addl %1, %0" : "=r" (sum) :
"r" (tmp))
% +#define ADDRC() __asm __volatile("adcl %1, %0" : "=r" (sum) :
"r" (tmp))
% +#define LODE(n) __asm __volatile ("movl " #n "(%1), %0" : "=r"
(tmp) : "r" (w))
% + u_int tmp;
% +
% + __asm __volatile("nop"); LODE(0);
% + ADDR(); LODE(4);
% + ADDRC(); LODE(8);
% + ADDRC(); LODE(12);
% + ADDRC(); LODE(16);
% + ADDRC(); LODE(20);
% + ADDRC(); LODE(24);
% + ADDRC(); LODE(28);
% + ADDRC();
% + MOP;
% +#else /* !I586_OPTIMIZED_CKSUM_2 */

This optimizes for P1's 32-byte cache lines in the same (buggy) way that
the current code optimizes for i386's 16-byte cache lines.

% /*
% * Add with carry 16 words and fold in the last
% @@ -386,4 +638,5 @@
% ADDC(28);
% MOP;
% +#endif /* I586_OPTIMIZED_CKSUM_2 */
% w += 16;
% }

Bruce

-----Original Message-----
From: Bruce Evans [mailto:bde@zeta.org.au]
Sent: Tuesday, September 12, 2006 2:20 PM
To: Li, Qing
Subject: RE: i386/69257: [i386] [patch] in_cksum_hdr is non-functional
without -O compiler flag

I wrote:
> % +#ifdef I586_OPTIMIZED_CKSUM_2
> % +#define ADDR() __asm __volatile("addl %1, %0" : "=r" (sum) :
"r"
> (tmp))
> % +#define ADDRC() __asm __volatile("adcl %1, %0" : "=r" (sum) :
"r"
> (tmp))
> % +#define LODE(n) __asm __volatile ("movl " #n "(%1), %0" : "=r"
(tmp)
> : "r" (w))
> % + u_int tmp;
> % +
> % + __asm __volatile("nop"); LODE(0);
> % + ADDR(); LODE(4);
> % + ADDRC(); LODE(8);
> % + ADDRC(); LODE(12);
> % + ADDRC(); LODE(16);
> % + ADDRC(); LODE(20);
> % + ADDRC(); LODE(24);
> % + ADDRC(); LODE(28);
> % + ADDRC();
> % + MOP;
> % +#else /* !I586_OPTIMIZED_CKSUM_2 */
>
> This optimizes for P1's 32-byte cache lines in the same (buggy) way
> that the current code optimizes for i386's 16-byte cache lines.

Oops, both versions do 32 bytes at a time, "i386" in the above shpuld
have been "i486", and I may misremember the size of an i486 cache line.
The above just avoids adding from memory to registers since separate
loads and register-register additions are easier to pipeline and in fact
are pipelined better on old CPUs like P1s.

Bruce

http://www.freebsd.org/cgi/query-pr.cgi?pr=69257 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: i386/69257: commit references a PR
Date: Sat, 20 Oct 2007 22:18:50 +0000 (UTC)

 bz          2007-10-20 22:18:42 UTC
 
   FreeBSD src repository
 
   Modified files:
     sys/i386/i386        in_cksum.c 
     sys/i386/include     in_cksum.h 
   Log:
   Fold multiple asm statements into one so that the compiler at a certain
   optimization level (-march=pentium-mmx for example) does not insert
   intermediate ops which would trash the carry.
   
   Change both sys/i386/i386/in_cksum.c[1] and sys/i386/include/in_cksum.h.
   
   To my best understanding the same problem was addressed in rev. 1.16
   of src/sys/i386/include/in_cksum.h for just a single function 3y ago.
   
   Reviewed by:  jhb
   Submitted by: Zhouyi ZHOU <zhouzhouyi FreeBSD.org> (intial version of [1])
   MFC after:    5 days
   PR:           115678, 69257
   
   Revision  Changes    Path
   1.29      +77 -43    src/sys/i386/i386/in_cksum.c
   1.18      +14 -7     src/sys/i386/include/in_cksum.h
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 

From: "Bjoern A. Zeeb" <bz@FreeBSD.org>
To: bug-followup@FreeBSD.org, qing.li@bluecoat.com
Cc:  
Subject: Re: i386/69257: [i386] [patch] in_cksum_hdr is non-functional without
 -O compiler flag
Date: Sat, 20 Oct 2007 22:35:23 +0000 (UTC)

 Hi,
 
 I think  with the latest commit this can be closed?
 
 http://docs.freebsd.org/cgi/mid.cgi?200710202218.l9KMIgpL068209
 
 -- 
 Bjoern A. Zeeb                                 bzeeb at Zabbadoz dot NeT
 Software is harder than hardware  so better get it right the first time.
State-Changed-From-To: feedback->closed 
State-Changed-By: bz 
State-Changed-When: Sun Oct 21 17:37:13 UTC 2007 
State-Changed-Why:  
Things should be fixed with my latest commit. 


Responsible-Changed-From-To: remko->bz 
Responsible-Changed-By: bz 
Responsible-Changed-When: Sun Oct 21 17:37:13 UTC 2007 
Responsible-Changed-Why:  
I'll handle follow-ups. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=69257 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: i386/69257: commit references a PR
Date: Fri, 26 Oct 2007 07:15:12 +0000 (UTC)

 bz          2007-10-26 07:15:04 UTC
 
   FreeBSD src repository
 
   Modified files:        (Branch: RELENG_7)
     sys/i386/i386        in_cksum.c 
     sys/i386/include     in_cksum.h 
   Log:
   MFC: rev. 1.29 sys/i386/i386/in_cksum.c
        rev. 1.18 sys/i386/include/in_cksum.h
   
     Fold multiple asm statements into one so that the compiler at a certain
     optimization level (-march=pentium-mmx for example) does not insert
     intermediate ops which would trash the carry.
   
     Change both sys/i386/i386/in_cksum.c[1] and sys/i386/include/in_cksum.h.
   
     To my best understanding the same problem was addressed in rev. 1.16
     of src/sys/i386/include/in_cksum.h for just a single function 3y ago.
   
     Reviewed by:  jhb
     Submitted by: Zhouyi ZHOU <zhouzhouyi FreeBSD.org> (intial version of [1])
     PR:           115678, 69257
   
   Approved by:    re (kensmith)
   
   Revision   Changes    Path
   1.28.10.1  +77 -43    src/sys/i386/i386/in_cksum.c
   1.17.10.1  +14 -7     src/sys/i386/include/in_cksum.h
 _______________________________________________
 cvs-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/cvs-all
 To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
 
>Unformatted:
