From sa2c@sa2c.net  Sat Aug 10 13:41:00 2002
Return-Path: <sa2c@sa2c.net>
Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1422937B400
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 10 Aug 2002 13:41:00 -0700 (PDT)
Received: from berkeley.sa2c.net (berkeley.sa2c.net [61.194.193.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 7C3BA43E3B
	for <FreeBSD-gnats-submit@freebsd.org>; Sat, 10 Aug 2002 13:40:59 -0700 (PDT)
	(envelope-from sa2c@sa2c.net)
Received: by berkeley.sa2c.net (Postfix, from userid 3104)
	id 33B9231D; Sun, 11 Aug 2002 05:40:58 +0900 (JST)
Message-Id: <20020810204058.33B9231D@berkeley.sa2c.net>
Date: Sun, 11 Aug 2002 05:40:58 +0900 (JST)
From: NIIMI Satoshi <sa2c@sa2c.net>
Reply-To: NIIMI Satoshi <sa2c@sa2c.net>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: better stack alignment patch for lib/csu/i386-elf/
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         41528
>Category:       i386
>Synopsis:       better stack alignment patch for lib/csu/i386-elf/
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Sat Aug 10 13:50:01 PDT 2002
>Closed-Date:    Sun Nov 17 21:03:34 PST 2002
>Last-Modified:  Sun Nov 17 21:03:34 PST 2002
>Originator:     NIIMI Satoshi
>Release:        FreeBSD 4.6.1-RELEASE-p10 i386
>Organization:
>Environment:
System: FreeBSD berkeley.sa2c.net 4.6.1-RELEASE-p10 FreeBSD 4.6.1-RELEASE-p10 #5: Fri Aug 9 16:33:26 JST 2002 sa2c@berkeley.sa2c.net:/usr/obj/usr/src/sys/SA2C_NET i386


	
>Description:

Although system C compiler, GCC, maintains stack pointer to keep
aligned to 2**preferred-stack-boundary byte, C startup routine does
not care about this.

This causes a big performance penalty for floating point operations
with variables in stack frame because IA32 CPUs are optimized to
operate with aligned data.

	
>How-To-Repeat:
With code: foo.c
#include <stdio.h>
main()
{
	double x;
	printf("%p\n", &x);
}
% cc foo.c
% ./a.out
0xbfbff730	<- aligned to 8-byte boundary.
% ./a.out a
0xbfbff72c	<- not aligned if command line arguments are passed.
	
>Fix:

(gcc 3.1 masks %esp by himself, so this patch might not be required
for -current.)

--- diff begins here ---
Index: stable/lib/csu/i386-elf/crt1.c
===================================================================
RCS file: /home/ncvs/src/lib/csu/i386-elf/crt1.c,v
retrieving revision 1.4.2.1
diff -u -r1.4.2.1 crt1.c
--- stable/lib/csu/i386-elf/crt1.c	30 Oct 2000 20:32:24 -0000	1.4.2.1
+++ stable/lib/csu/i386-elf/crt1.c	10 Aug 2002 19:40:54 -0000
@@ -93,7 +93,33 @@
     monstartup(&eprol, &etext);
 #endif
     _init();
+#if 0
     exit( main(argc, argv, env) );
+#else
+    /*
+     * GCC expects stack frame to be aligned like following figure.
+     *
+     *  +--------------+
+     *  |%ebp (if any) |
+     *  +--------------+
+     *  |return address|
+     *  +--------------+ --- aligned by PREFERRED_STACK_BOUNDARY
+     *  |  arguments   |
+     *  |      :       |
+     *  |      :       |
+     */
+    __asm__ ("
+	subl	$12, %%esp		# create stack frame for arguments
+	andl	$~0xf, %%esp		# align stack to 16-byte boundary
+	movl	%0, 0(%%esp)
+	movl	%1, 4(%%esp)
+	movl	%2, 8(%%esp)
+	call	main
+	movl	%%eax, 0(%%esp)
+	call	exit
+	hlt				# do not return
+    " : : "r"(argc), "r"(argv), "r"(env));
+#endif
 }
 
 #ifdef GCRT
Index: current/lib/csu/i386-elf/crt1.c
===================================================================
RCS file: /home/ncvs/src/lib/csu/i386-elf/crt1.c,v
retrieving revision 1.9
diff -u -r1.9 crt1.c
--- current/lib/csu/i386-elf/crt1.c	16 Jul 2002 12:28:49 -0000	1.9
+++ current/lib/csu/i386-elf/crt1.c	10 Aug 2002 19:46:42 -0000
@@ -100,7 +100,33 @@
 	monstartup(&eprol, &etext);
 #endif
 	_init();
+#if 0
 	exit( main(argc, argv, env) );
+#else
+	/*
+	 * GCC expects stack frame to be aligned like following figure.
+	 *
+	 *  +--------------+
+	 *  |%ebp (if any) |
+	 *  +--------------+
+	 *  |return address|
+	 *  +--------------+ --- aligned by PREFERRED_STACK_BOUNDARY
+	 *  |  arguments   |
+	 *  |      :       |
+	 *  |      :       |
+	 */
+	__asm__ ("
+		subl	$12, %%esp	# create stack frame for arguments
+		andl	$~0xf, %%esp	# align stack to 16-byte boundary
+		movl	%0, 0(%%esp)
+		movl	%1, 4(%%esp)
+		movl	%2, 8(%%esp)
+		call	main
+		movl	%%eax, 0(%%esp)
+		call	exit
+		hlt			# do not return
+	" : : "r"(argc), "r"(argv), "r"(env));
+#endif
 }
 
 #ifdef GCRT
--- diff ends here ---

	
>Release-Note:
>Audit-Trail:

From: Bruce Evans <bde@zeta.org.au>
To: NIIMI Satoshi <sa2c@sa2c.net>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: i386/41528: better stack alignment patch for lib/csu/i386-elf/
Date: Tue, 13 Aug 2002 00:12:56 +1000 (EST)

 On Sun, 11 Aug 2002, NIIMI Satoshi wrote:
 
 > >Description:
 >
 > Although system C compiler, GCC, maintains stack pointer to keep
 > aligned to 2**preferred-stack-boundary byte, C startup routine does
 > not care about this.
 >
 > This causes a big performance penalty for floating point operations
 > with variables in stack frame because IA32 CPUs are optimized to
 > operate with aligned data.
 
 I think stack alignment belongs only in functions that actually store
 floating point or other data that actually needs large alignment, and
 it should be done by the compiler since only the compiler knows the
 alignment requirements.  gcc on i386's requires the stack to be aligned
 4 bytes below a 2^<preferred-stack-boundary> boundary on entry to each
 function, but this is compiler-dependent and pessimal IMO.
 
 > >Fix:
 >
 > (gcc 3.1 masks %esp by himself, so this patch might not be required
 > for -current.)
 
 Right; it isen't needed for -current.  gcc-3.1 treats main() specially
 and aligns the stack using a single andl instruction:
 
 ---
 $ cat z.c
 main()
 {
 	foo();
 }
 # -mpreferred-stack-boundary=4 is to ensure the standard default.
 $ cc -O -S z.c -mpreferred-stack-boundary=4
 $ cat z.s
 
 	.file	"z.c"
 	.text
 	.p2align 2,,3
 .globl main
 	.type	main,@function
 main:
 	pushl	%ebp
 	movl	%esp, %ebp
 	subl	$8, %esp		<-- waste time
 	andl	$-16, %esp		<-- align
 	call	foo
 	leave
 	ret
 .Lfe1:
 	.size	main,.Lfe1-main
 	.ident	"GCC: (GNU) 3.1 [FreeBSD] 20020509 (prerelease)"
 ---
 
 gcc only seems to do the andl for main().  It still does the old stack
 alignment for main() and apparently depends on all other functions doing
 it.
 
 I still use my old kernel hack for alignment:
 
 %%%
 Index: kern_exec.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/kern/kern_exec.c,v
 retrieving revision 1.179
 diff -u -2 -r1.179 kern_exec.c
 --- kern_exec.c	1 Aug 2002 14:31:58 -0000	1.179
 +++ kern_exec.c	2 Aug 2002 13:53:14 -0000
 @@ -846,4 +843,13 @@
 
  	/*
 +	 * Align stack to a multiple of 0x20.
 +	 * XXX vectp has the wrong type; we usually want a vm_offset_t;
 +	 * the suword() family takes a void *, but should take a vm_offset_t.
 +	 * XXX should align stack for signals too.
 +	 * XXX should do this more machine/compiler-independently.
 +	 */
 +	vectp = (char **)(((vm_offset_t)vectp & ~(vm_offset_t)0x1F) - 4);
 +
 +	/*
  	 * vectp also becomes our initial stack base
  	 */
 %%%
 
 This is a wrong place to do alignment (see above).  Alignment to a
 nice power of 2 here would be reasonable, but subtracting 4 isn't.  4
 is magic, and the correct number depends on the language and the
 compiler that the application being exec'ed was compiled with, so it
 shoulnd't be given by a kernel arch-dependent #define.
 
 Fortunately, the problem is now fixed for the most important case of
 language == c && compiler == cc-current.  However, there is still a
 problem for signal handlers.  The i386 signal trampoline is:
 
 %%%
 NON_GPROF_ENTRY(sigcode)
 	call	*SIGF_HANDLER(%esp)	/* call signal handler */
 %%%
 
 This pushes one 32-bit word (for the return address) on top of the
 carelessly aligned argument words.  To work properly for cc-current,
 the stack still needs to be 16-byte aligned before the call.  Fortunately,
 this bug is mostly mott because floating point in signal handlers is
 not useful and doesn't work.
 
 > --- diff begins here ---
 > Index: stable/lib/csu/i386-elf/crt1.c
 > ===================================================================
 > RCS file: /home/ncvs/src/lib/csu/i386-elf/crt1.c,v
 > retrieving revision 1.4.2.1
 > diff -u -r1.4.2.1 crt1.c
 > --- stable/lib/csu/i386-elf/crt1.c	30 Oct 2000 20:32:24 -0000	1.4.2.1
 > +++ stable/lib/csu/i386-elf/crt1.c	10 Aug 2002 19:40:54 -0000
 > @@ -93,7 +93,33 @@
 >      monstartup(&eprol, &etext);
 >  #endif
 >      _init();
 > +#if 0
 >      exit( main(argc, argv, env) );
 > +#else
 > +    /*
 > +     * GCC expects stack frame to be aligned like following figure.
 > +     *
 > +     *  +--------------+
 > +     *  |%ebp (if any) |
 > +     *  +--------------+
 > +     *  |return address|
 > +     *  +--------------+ --- aligned by PREFERRED_STACK_BOUNDARY
 > +     *  |  arguments   |
 > +     *  |      :       |
 > +     *  |      :       |
 > +     */
 > +    __asm__ ("
 > +	subl	$12, %%esp		# create stack frame for arguments
 > +	andl	$~0xf, %%esp		# align stack to 16-byte boundary
 > +	movl	%0, 0(%%esp)
 > +	movl	%1, 4(%%esp)
 > +	movl	%2, 8(%%esp)
 > +	call	main
 > +	movl	%%eax, 0(%%esp)
 > +	call	exit
 > +	hlt				# do not return
 > +    " : : "r"(argc), "r"(argv), "r"(env));
 > +#endif
 >  }
 > ...
 
 I would only use this fix or one like it in RELENG_4.  Maybe my kernel
 hack is better since it "fixes" most applications without a recompile.
 It is simpler because it doesn't use any assembly code or have to recover
 from the kernel pushing the args in a misaligned place.
 
 Minor improvements:
 - remove the comment about %ebp since the caller doesn't handle %ebp
 - remove the hlt since it wouldn't really help if it were reached (it is
   privileged so it just generates SIGBUS, and execution continues at the
   instruction after the hlt if there is a SIGBUS handler and it returns...).
 
 Bruce
 

From: NIIMI Satoshi <sa2c@sa2c.net>
To: Bruce Evans <bde@zeta.org.au>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: i386/41528: better stack alignment patch for lib/csu/i386-elf/
Date: 13 Aug 2002 14:26:33 +0900

 Thanks for your long descriptions which I should write.
 
 Bruce Evans <bde@zeta.org.au> writes:
 
 > I think stack alignment belongs only in functions that actually store
 > floating point or other data that actually needs large alignment, and
 > it should be done by the compiler since only the compiler knows the
 > alignment requirements.  gcc on i386's requires the stack to be aligned
 > 4 bytes below a 2^<preferred-stack-boundary> boundary on entry to each
 > function, but this is compiler-dependent and pessimal IMO.
 
 Yes.  But it's too hard for me to hack gcc.  ;-)
 
 > This is a wrong place to do alignment (see above).  Alignment to a
 > nice power of 2 here would be reasonable, but subtracting 4 isn't.  4
 > is magic, and the correct number depends on the language and the
 > compiler that the application being exec'ed was compiled with, so it
 > shoulnd't be given by a kernel arch-dependent #define.
 
 I have similar hack to align stack by kernel for pre-compiled
 executables.  But I did not send-pr it because:
 
 1. As you say, this is not correct place.  This depends on both CPU
    architecture and executable image ABI.
 
 2. Because ELF ABI does not define stack alignment, other compilers
    may want stack to be aligned differently.  This hack makes happy
    only executables that are compiled with gcc.
 
 > I would only use this fix or one like it in RELENG_4.  Maybe my kernel
 > hack is better since it "fixes" most applications without a recompile.
 > It is simpler because it doesn't use any assembly code or have to recover
 > from the kernel pushing the args in a misaligned place.
 
 Thanks.  But is it possible?  I attached a patch for -current so that
 it can be commited into -current then MFC'ed to -stable.
 
 > Minor improvements:
 > - remove the comment about %ebp since the caller doesn't handle %ebp
 > - remove the hlt since it wouldn't really help if it were reached (it is
 >   privileged so it just generates SIGBUS, and execution continues at the
 >   instruction after the hlt if there is a SIGBUS handler and it returns...).
 
 I agree.  Please remove them.
 
 -- 
 NIIMI Satoshi
 

From: Bruce Evans <bde@zeta.org.au>
To: NIIMI Satoshi <sa2c@sa2c.net>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: i386/41528: better stack alignment patch for lib/csu/i386-elf/
Date: Thu, 26 Sep 2002 00:28:06 +1000 (EST)

 On 13 Aug 2002, NIIMI Satoshi wrote:
 
 > Bruce Evans <bde@zeta.org.au> writes:
 > > ...
 > > I would only use this fix or one like it in RELENG_4.  Maybe my kernel
 > > hack is better since it "fixes" most applications without a recompile.
 > > It is simpler because it doesn't use any assembly code or have to recover
 > > from the kernel pushing the args in a misaligned place.
 >
 > Thanks.  But is it possible?  I attached a patch for -current so that
 > it can be commited into -current then MFC'ed to -stable.
 
 I just got around to preparing this for commit (hopefully just before 4.7),
 and found a small problem.  There seems to be an off-by-8 error.
 
 Original patch:
 
 % Index: stable/lib/csu/i386-elf/crt1.c
 % ===================================================================
 % RCS file: /home/ncvs/src/lib/csu/i386-elf/crt1.c,v
 % retrieving revision 1.4.2.1
 % diff -u -r1.4.2.1 crt1.c
 % --- stable/lib/csu/i386-elf/crt1.c	30 Oct 2000 20:32:24 -0000	1.4.2.1
 % +++ stable/lib/csu/i386-elf/crt1.c	10 Aug 2002 19:40:54 -0000
 % @@ -93,7 +93,33 @@
 %      monstartup(&eprol, &etext);
 %  #endif
 %      _init();
 % +#if 0
 %      exit( main(argc, argv, env) );
 % +#else
 % +    /*
 % +     * GCC expects stack frame to be aligned like following figure.
 % +     *
 % +     *  +--------------+
 % +     *  |%ebp (if any) |
 % +     *  +--------------+
 % +     *  |return address|
 % +     *  +--------------+ --- aligned by PREFERRED_STACK_BOUNDARY
 % +     *  |  arguments   |
 % +     *  |      :       |
 % +     *  |      :       |
 % +     */
 
 This is where gcc-3 wants the stack aligned, but gcc-2 apparently wants
 it defined 8 bytes lower (higher in the diagram), after pushing %ebp.
 
 I am now testing the following patch:
 
 %%%
 Index: crt1.c
 ===================================================================
 RCS file: /home/ncvs/src/lib/csu/i386-elf/crt1.c,v
 retrieving revision 1.9
 diff -u -2 -r1.9 crt1.c
 --- crt1.c	16 Jul 2002 12:28:49 -0000	1.9
 +++ crt1.c	25 Sep 2002 14:23:24 -0000
 @@ -101,5 +101,34 @@
  #endif
  	_init();
 +#ifndef __GNUC__
  	exit( main(argc, argv, env) );
 +#else
 +	/*
 +	 * gcc-2 expects the stack frame to be aligned as follows after it
 +	 * is set up in main():
 +	 *
 +	 *  +--------------+ <--- aligned by PREFERRED_STACK_BOUNDARY
 +	 *  +%ebp (if any) +
 +	 *  +--------------+
 +	 *  |return address|
 +	 *  +--------------+
 +	 *  |  arguments   |
 +	 *  |      :       |
 +	 *  |      :       |
 +	 *  +--------------+
 +	 *
 +	 * The call must be written in assembler to implement this.
 +	 */
 +	__asm__("
 +	andl	$~0xf, %%esp		# align stack to 16-byte boundary
 +	subl	$12+12, %%esp		# space for args and padding
 +	movl	%0, 0(%%esp)
 +	movl	%1, 4(%%esp)
 +	movl	%2, 8(%%esp)
 +	call	main
 +	movl	%%eax, 0(%%esp)
 +	call	exit
 +	" : : "r" (argc), "r" (argv), "r" (env) : "ax", "cx", "dx", "memory");
 +#endif
  }
 
 %%%
 
 Bruce
 

From: NIIMI Satoshi <sa2c@sa2c.net>
To: Bruce Evans <bde@zeta.org.au>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: i386/41528: better stack alignment patch for lib/csu/i386-elf/
Date: 26 Sep 2002 05:48:56 +0900

 Bruce Evans <bde@zeta.org.au> writes:
 
 > I just got around to preparing this for commit (hopefully just before 4.7),
 > and found a small problem.  There seems to be an off-by-8 error.
 
 I confirm the problem.
 
 > I am now testing the following patch:
 > 
 > %%%
 > Index: crt1.c
 > ===================================================================
 > RCS file: /home/ncvs/src/lib/csu/i386-elf/crt1.c,v
 > retrieving revision 1.9
 > diff -u -2 -r1.9 crt1.c
 > --- crt1.c	16 Jul 2002 12:28:49 -0000	1.9
 > +++ crt1.c	25 Sep 2002 14:23:24 -0000
 > @@ -101,5 +101,34 @@
 >  #endif
 >  	_init();
 > +#ifndef __GNUC__
 >  	exit( main(argc, argv, env) );
 > +#else
 > +	/*
 > +	 * gcc-2 expects the stack frame to be aligned as follows after it
 > +	 * is set up in main():
 > +	 *
 > +	 *  +--------------+ <--- aligned by PREFERRED_STACK_BOUNDARY
 > +	 *  +%ebp (if any) +
 > +	 *  +--------------+
 > +	 *  |return address|
 > +	 *  +--------------+
 > +	 *  |  arguments   |
 > +	 *  |      :       |
 > +	 *  |      :       |
 > +	 *  +--------------+
 > +	 *
 > +	 * The call must be written in assembler to implement this.
 > +	 */
 > +	__asm__("
 > +	andl	$~0xf, %%esp		# align stack to 16-byte boundary
 > +	subl	$12+12, %%esp		# space for args and padding
 > +	movl	%0, 0(%%esp)
 > +	movl	%1, 4(%%esp)
 > +	movl	%2, 8(%%esp)
 > +	call	main
 > +	movl	%%eax, 0(%%esp)
 > +	call	exit
 > +	" : : "r" (argc), "r" (argv), "r" (env) : "ax", "cx", "dx", "memory");
 > +#endif
 >  }
 > 
 > %%%
 
 I tested your patch with the following code.
 
 Test code:
 #include <stdio.h>
 
 struct foo
 {
 	int a;
 } __attribute__((aligned(16)));
 
 int
 main(int argc, char **argv, char *envp)
 {
 	struct foo x;
 	struct foo y;
 
 	printf("%p %p\n", &x, &y);
 }
 
 Produced assembly (with cc -O):
 main:
 	pushl %ebp
 	movl %esp,%ebp
 	subl $40,%esp
 	addl $-4,%esp
 	leal -32(%ebp),%eax		#A
 	pushl %eax
 	leal -16(%ebp),%eax		#B
 	pushl %eax
 	pushl $.LC0
 	call printf
 	leave
 	ret
 
 At #A and #B, GCC expects %ebp as aligned by PREFERRED_STACK_BOUNDARY.
 (This is what your diagram shows.)
 
 But with 'cc -O -fomit-frame-pointer', the expected alignment is different.
 
 Produced assembly (with cc -O -fomit-frame-pointer):
 main:
 	subl $44,%esp
 	addl $-4,%esp
 	leal 4(%esp),%eax		#A
 	pushl %eax
 	leal 24(%esp),%eax		#B
 	pushl %eax
 	pushl $.LC0
 	call printf
 	addl $16,%esp
 	addl $44,%esp
 	ret
 
 #A points to original %esp - 44.  #B points to original %esp - 28.
 In this case, GCC expects argument address as aligned by
 PREFERRED_STACK_BOUNDARY.  (This is what my diagram shows.)
 And there is an off-by-8 error with your patch.
 
 This means that there are no way to remove an off-by-8 error.
 
 Because '-fomit-frame-pointer' is not used in the default setting of
 FreeBSD, I think your patch is preferable.
 
 BTW, why did you substruct 12+12 from %esp?  I think 12+4 is
 sufficient.
 
 -- 
 NIIMI Satoshi
 

From: Bruce Evans <bde@zeta.org.au>
To: NIIMI Satoshi <sa2c@sa2c.net>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: i386/41528: better stack alignment patch for lib/csu/i386-elf/
Date: Thu, 26 Sep 2002 13:15:26 +1000 (EST)

 On 26 Sep 2002, NIIMI Satoshi wrote:
 
 > Bruce Evans <bde@zeta.org.au> writes:
 >
 > > I just got around to preparing this for commit (hopefully just before 4.7),
 > > and found a small problem.  There seems to be an off-by-8 error.
 >
 > I confirm the problem.
 
 Thanks.
 
 > I tested your patch with the following code.
 >
 > Test code:
 > #include <stdio.h>
 >
 > struct foo
 > {
 > 	int a;
 > } __attribute__((aligned(16)));
 >
 > int
 > main(int argc, char **argv, char *envp)
 > {
 > 	struct foo x;
 > 	struct foo y;
 >
 > 	printf("%p %p\n", &x, &y);
 > }
 >
 > Produced assembly (with cc -O):
 > main:
 > 	pushl %ebp
 > 	movl %esp,%ebp
 > 	subl $40,%esp
 > 	addl $-4,%esp
 > 	leal -32(%ebp),%eax		#A
 > 	pushl %eax
 > 	leal -16(%ebp),%eax		#B
 > 	pushl %eax
 > 	pushl $.LC0
 > 	call printf
 > 	leave
 > 	ret
 >
 > At #A and #B, GCC expects %ebp as aligned by PREFERRED_STACK_BOUNDARY.
 > (This is what your diagram shows.)
 >
 > But with 'cc -O -fomit-frame-pointer', the expected alignment is different.
 
 Urk.
 
 > Produced assembly (with cc -O -fomit-frame-pointer):
 > main:
 > 	subl $44,%esp
 > 	addl $-4,%esp
 > 	leal 4(%esp),%eax		#A
 > 	pushl %eax
 > 	leal 24(%esp),%eax		#B
 > 	pushl %eax
 > 	pushl $.LC0
 > 	call printf
 > 	addl $16,%esp
 > 	addl $44,%esp
 > 	ret
 >
 > #A points to original %esp - 44.  #B points to original %esp - 28.
 > In this case, GCC expects argument address as aligned by
 > PREFERRED_STACK_BOUNDARY.  (This is what my diagram shows.)
 > And there is an off-by-8 error with your patch.
 >
 > This means that there are no way to remove an off-by-8 error.
 
 Yes, there is no general way, since the alignment required in crt1.c
 depends on at least the compiler version and options.
 
 > Because '-fomit-frame-pointer' is not used in the default setting of
 > FreeBSD, I think your patch is preferable.
 >
 > BTW, why did you substruct 12+12 from %esp?  I think 12+4 is
 > sufficient.
 
 The extra 8 is to fix the off-by-8 error in some cases :-).  When main() is
 compiled by:
 - gcc-2.95.4                           extra 8 aligns stack right
 - gcc-2.95.4 -fomit-frame-pointer      extra 8 gives off-by-8 error
 - gcc-3.2.1                            extra 8 gives off-by-8 error
 - gcc-3.2.1 -fomit-frame-pointer       extra 8 gives off-by-8 error
 
 gcc-3.2.1 does things better but still has 1 bug here: it does the "andl"
 adjustment after allocating space for auto variables in main(), so these
 variables may be misaligned.  But the "andl" realigns the stack for
 functions called by main(), so the off-by-8 error is not very serious.
 
 I tried to align the stack less unportably using alloca(16), but gave up
 after finding just a different morass of bugs and incompatibilities.  I
 learned of the following useful non-bugs:
 - the alignment given by the builtin alloca() is fixed in gcc-3.2.1.  So
   you can now allocate auto variables that need to be strictly aligned
   using "foo_t *fp = alloca(sizeof(foo_t));" even when the function or one
   of its callers is compiled with a non-default -mpreferred-stack-boundary.
   "foo_t f;" is still broken here.
 - variable-sized auto arrays are fixed similarly.
 
 Bruce
 

From: NIIMI Satoshi <sa2c@sa2c.net>
To: Bruce Evans <bde@zeta.org.au>
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: i386/41528: better stack alignment patch for lib/csu/i386-elf/
Date: 26 Sep 2002 14:22:04 +0900

 Bruce Evans <bde@zeta.org.au> writes:
 
 > > BTW, why did you substruct 12+12 from %esp?  I think 12+4 is
 > > sufficient.
 > 
 > The extra 8 is to fix the off-by-8 error in some cases :-).
 
 Forget about it.  It seems that I was still sleeping.  :-)
 
 -- 
 NIIMI Satoshi
 
State-Changed-From-To: open->closed 
State-Changed-By: bde 
State-Changed-When: Sun Nov 17 21:02:01 PST 2002 
State-Changed-Why:  
Fixed in revs.1.10 (-current) and 1.4.2.2 (RELENG_4) of crt1.c. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=41528 
>Unformatted:
