From nobody@FreeBSD.org  Fri Aug  3 15:35:21 2012
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3A04A1065686
	for <freebsd-gnats-submit@FreeBSD.org>; Fri,  3 Aug 2012 15:35:21 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id 249F98FC16
	for <freebsd-gnats-submit@FreeBSD.org>; Fri,  3 Aug 2012 15:35:21 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id q73FZKb9014930
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 3 Aug 2012 15:35:20 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id q73FZKIr014920;
	Fri, 3 Aug 2012 15:35:20 GMT
	(envelope-from nobody)
Message-Id: <201208031535.q73FZKIr014920@red.freebsd.org>
Date: Fri, 3 Aug 2012 15:35:20 GMT
From: Ming Qiao <mqiao@juniper.net>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [patch] amd64: 64-bit process can't always get unlimited rlimit
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         170351
>Category:       amd64
>Synopsis:       [kernel] [patch] amd64: 64-bit process can't always get unlimited rlimit
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-amd64
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Aug 03 15:40:08 UTC 2012
>Closed-Date:    
>Last-Modified:  Fri Aug 10 03:27:06 UTC 2012
>Originator:     Ming Qiao
>Release:        FreeBSD 9.0-RC2
>Organization:
Juniper Networks
>Environment:
FreeBSD neys 9.0-RC2 FreeBSD 9.0-RC2 #0: Thu Jul 26 01:27:46 UTC 2012
root@neys:/usr/obj/usr/src/sys/GENERIC  amd64
>Description:
On the amd64 platform, if a 32-bit process ever manually set its rlimit,
none of its 64-bit child or offspring will be able to get the full 64-bit
rlimit anymore, even if they explicitly set the limit to unlimited.

Note that for the sake of simplicity, only datasize limit is referred
in this report. But the same logic applies to all other memory segment
(i.e. stacksize, etc.).

Take the following scenario as an example:
1) Let's say we have a 32-bit process p1 whose hard limit is set to 500MB by
calling setrlimit().
2) p1 then exec another 32-bit process p2.
3) p2 set its hard limit to unlimited by calling setrlimit().
4) p2 exec a 64-bit process p3.
5) check the hard limit of p3, we can see that it only has 3GB (value of
ia32_maxdsiz) instead of 32GB which is the global kernel limit (value of
maxdsiz) for a 64-bit process.

The root cause is that on step 3, p2 didn't actually set its limit to
the correct value when calling setrlimit(). Instead the limit is set to
ia32_maxdsiz since ia32_fixlimit() is called in kern_proc_setrlimit().
>How-To-Repeat:
There are 3 test programs attached in this report: 32_p1.c, 32_p2.c, and
64_p3.c. They can be used to reproduce the problem.

1) Compile 32_p1.c and 32_p2.c into 32-bit binaries. Compile 64_p3.c into
64-bit binary.
2) Put all 3 binaries into the same directory on a machine running FreeBSD
amd64 version.
3) Run 32_p1 which will exec 32_p2 and 64_p3. The output of 64_p3 will show
its limit is capped at ia32_maxdsiz.
>Fix:
The proposed fix is to change kern_proc_setrlimit() so that sv_fixlimit()
will not be called if the caller wants to set the new limit to RLIM_INFINITY.
Please refer to the attached diff file for the proposed fix.

Patch attached with submission follows:

# This is a shell archive.  Save it in a file, remove anything before
# this line, and then unpack it by entering "sh file".  Note, it may
# create directories; files and directories will be owned by you and
# have default permissions.
#
# This archive contains:
#
#	fix.diff
#	32_p1.c
#	32_p2.c
#	64_p3.c
#
echo x - fix.diff
sed 's/^X//' >fix.diff << 'bcc47fd7a380cd6506fa66c7fb3122d6'
X--- kern_resource.c	2012-08-02 07:41:59.000000000 -0700
X+++ kern_resource.c.modified	2012-08-02 07:40:40.771115000 -0700
X@@ -663,6 +663,7 @@
X 	register struct rlimit *alimp;
X 	struct rlimit oldssiz;
X 	int error;
X+	int is_lim_inf = 0;
X 
X 	if (which >= RLIM_NLIMITS)
X 		return (EINVAL);
X@@ -701,6 +702,8 @@
X 		p->p_cpulimit = limp->rlim_cur;
X 		break;
X 	case RLIMIT_DATA:
X+		if (limp->rlim_max == RLIM_INFINITY)
X+			is_lim_inf = 1;
X 		if (limp->rlim_cur > maxdsiz)
X 			limp->rlim_cur = maxdsiz;
X 		if (limp->rlim_max > maxdsiz)
X@@ -736,7 +739,8 @@
X 			limp->rlim_max = 1;
X 		break;
X 	}
X-	if (p->p_sysent->sv_fixlimit != NULL)
X+	if ((p->p_sysent->sv_fixlimit != NULL) &&
X+	    (1 != is_lim_inf))
X 		p->p_sysent->sv_fixlimit(limp, which);
X 	*alimp = *limp;
X 	p->p_limit = newlim;
bcc47fd7a380cd6506fa66c7fb3122d6
echo x - 32_p1.c
sed 's/^X//' >32_p1.c << '2732294ba2da13cbcf3153434d6e3482'
X/*
X * Test program for FreeBSD rlimit issue.
X * To be compiled to a 32-bit binary. 
X */
X
X#include <sys/types.h>
X#include <sys/time.h>
X#include <sys/resource.h>
X
X#include <stdio.h>
X#include <string.h>
X#include <unistd.h>
X#include <stdlib.h>
X#include <errno.h>
X
Xint
Xmain(int argc, char **argv)
X{
X    struct rlimit  currlimit, lim_new;
X    char * argv_exec[] = {"./32_p2", 0};
X
X    printf( "\n *** Starting 32-b process 1 *** \n");
X/*    sleep(15);*/
X
X    if ( 0 == getrlimit( RLIMIT_DATA, &currlimit ) ) {
X        printf("\n32_p1: rlim_cur = %lu, rlim_max = %lu\n", currlimit.rlim_cur, currlimit.rlim_max);
X    }
X    else {
X        printf("getrlimit failed!");
X    }	
X
X    lim_new.rlim_cur = lim_new.rlim_max = 524288000; /* 500M */
X
X    if (setrlimit(RLIMIT_DATA, &lim_new) < 0) {
X        printf("setrlimit failed! err=%d\n", errno);
X    }
X    else {
X        printf("32_p1: set limits to 500M\n");
X    }
X
X    if ( 0 == getrlimit( RLIMIT_DATA, &currlimit ) ) {
X        printf("\n32_p1: new rlim_cur = %lu, rlim_max = %lu\n", currlimit.rlim_cur, currlimit.rlim_max);
X    }
X    else {
X        printf("getrlimit failed!");
X    } 
X
X    printf("now exec 32_p2...\n");
X    execv( argv_exec[0], argv_exec );
X    exit(0);
X}
X
2732294ba2da13cbcf3153434d6e3482
echo x - 32_p2.c
sed 's/^X//' >32_p2.c << '921653888f0311e6f9044b483b332874'
X/*
X * Test program for FreeBSD rlimit issue.
X * To be compiled to a 32-bit binary.
X */
X
X#include <sys/types.h>
X#include <sys/time.h>
X#include <sys/resource.h>
X
X#include <stdio.h>
X#include <string.h>
X#include <unistd.h>
X#include <stdlib.h>
X#include <errno.h>
X
Xint
Xmain(int argc, char **argv)
X{
X    struct rlimit  currlimit, lim_new;
X    char * argv_exec[] = {"./64_p3", 0};
X
X    printf( "\n *** Starting 32-b process 2 *** \n");
X/*    sleep(15);*/
X
X    if ( 0 == getrlimit( RLIMIT_DATA, &currlimit ) ) {
X        printf("\n32_p2: rlim_cur = %lu, rlim_max = %lu\n", currlimit.rlim_cur, currlimit.rlim_max);
X    }
X    else {
X        printf("getrlimit failed!");
X    }	
X
X    lim_new.rlim_cur = lim_new.rlim_max = RLIM_INFINITY; 
X
X    if (setrlimit(RLIMIT_DATA, &lim_new) < 0) {
X        printf("setrlimit failed! err=%d\n", errno);
X    }
X    else {
X        printf("32_p2: set limits to RLIM_INFINITY\n");
X    }	
X
X    if ( 0 == getrlimit( RLIMIT_DATA, &currlimit ) ) {
X        printf("\n32_p2: new rlim_cur = %lu, rlim_max = %lu\n", currlimit.rlim_cur, currlimit.rlim_max);
X    }
X    else {
X        printf("getrlimit failed!");
X    } 
X   
X    printf("now exec 64_p3...\n");
X
X    execv( argv_exec[0], argv_exec );
X    exit(0);
X}
X
921653888f0311e6f9044b483b332874
echo x - 64_p3.c
sed 's/^X//' >64_p3.c << 'e22e8191882a74b0e1f833ce4465896a'
X/*
X * Test program for FreeBSD rlimit issue.
X * To be compiled to a 64-bit binary.
X */
X
X#include <sys/types.h>
X#include <sys/time.h>
X#include <sys/resource.h>
X
X#include <stdio.h>
X#include <string.h>
X#include <unistd.h>
X#include <stdlib.h>
X#include <errno.h>
X
Xint
Xmain(int argc, char **argv)
X{
X    void * p = NULL;
X    unsigned long c;
X    struct rlimit  currlimit;
X
X    printf( "\n *** Starting 64-b process 3 *** \n");
X    /* sleep(15); */
X
X    if ( 0 == getrlimit( RLIMIT_DATA, &currlimit ) ) {
X        printf("\n64_p3: rlim_cur = %lu, rlim_max = %lu\n", currlimit.rlim_cur, currlimit.rlim_max);
X    }
X    else {
X        printf("getrlimit failed!");
X    }	
X     
X    p = sbrk(0);
X
X    while (brk(p + 1024*1024) == 0) {
X        c++;
X	p = sbrk(0);
X    }
X
X    printf("64_p3: %d 1MB blocks allocated (%m).\n", c);
X
X    exit(0);
X}
X
e22e8191882a74b0e1f833ce4465896a
exit



>Release-Note:
>Audit-Trail:

From: Konstantin Belousov <kostikbel@gmail.com>
To: Ming Qiao <mqiao@juniper.net>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: amd64/170351: [patch] amd64: 64-bit process can't always get unlimited rlimit
Date: Fri, 3 Aug 2012 20:39:23 +0300

 --4rHvg5NaBspxp64F
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 The 'fix' is wrong and does not address the issue.
 Instead, it uses some arbitrary properties of the scenario you considered
 and adapts kernel code to suit your scenario. Your deny the correction
 of the infinity limit, I do not see how it can be right.
 
 The problem you described is architectural. By design, Unix resource
 limits cannot be increased after they were decreased, except by root.
 In your scenario, the limits were decreased by mere fact of running the
 32bit process which have lower 'infinity' limits then 64bit processes.
 
 That said, I see two possible solutions.
 
 First is to manually set compat.ia32.max* sysctls to 0. Then you get
 desired behaviour for 64bit processes execed from 32bit, it seems.
 It does not require code change. Since you are fine with denying fix
 for infinity, this setting gives the same effect as the patch.
 
 Second approach (which is essentially a correction to your approach
 from fix.diff) is to track the fact that corresponding rlimits are set
 to 'ABI infinity', in some per-struct rlimit flag. Then, get/setrlimit
 should first test the 'ABI infinity' flag and behave as if rlimit is set
 to infinity for current bitness even if the actual value of rlimit is
 not infinity. Flag is set when rlimit is set to infinity by current ABI.
 
 The second approach would provide 'correct' fix, but it is not trivial
 amount of work for very rare situation (execing 64bit process from 32bit),
 and current behaviour of inheriting 32bit limits may be argued as right.
 If you want, feel free to develop such patch, I will review and commit it,
 but I do not want to spend efforts on developing it myself ATM.
 
 --4rHvg5NaBspxp64F
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (FreeBSD)
 
 iEYEARECAAYFAlAcDMoACgkQC3+MBN1Mb4gNuQCePkHJVJy34hUB34TjWliF/M53
 V3wAn1Xito7num8GNVfJz0gw3Rb0o3Rz
 =InLX
 -----END PGP SIGNATURE-----
 
 --4rHvg5NaBspxp64F--

From: Konstantin Belousov <konstantin.belousov@zoral.com.ua>
To: Ming Qiao <mqiao@juniper.net>
Cc: Erin MacNeil <emacneil@juniper.net>, freebsd-gnats-submit@freebsd.org
Subject: Re: amd64/170351: [patch] amd64: 64-bit process can't always get unlimited rlimit
Date: Thu, 9 Aug 2012 01:06:31 +0300

 --V8ijD2GsuVnOuiV5
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 Do not strip public lists from the discussion. There is nothing private.
 
 On Tue, Aug 07, 2012 at 05:52:07PM -0400, Ming Qiao wrote:
 > Hi Konstantin,
 >=20
 > Thanks for your quick response. Actually I'm not very clear about
 > the second approach you mentioned. Some questions here: 1) Could you
 > please elaborate the idea of "tracking rlimits set to ABI infinity"?
 > If I understand correctly, you are referring to a model where a
 > process can have it rlimit set multiple times by different ABI? But
 > what does it mean exactly? Could you give a simple example here? 2)
 > What do you mean by "per-struct rlimit"? Do you mean each memory
 > segment as a struct? such as datasize, stacksize, etc.
 
 I mean that in addition to the existing array of pl_rlimit in struct
 plimit, you also create an bitmap array of the same size. Set bit
 in this new array would indicate that corresponding limit was set
 (either implicit, or explicitely by usermode) to infinity. The bit
 has its meaning regardless of the actual numeric value written into
 the pl_rlimit, either by syscall or by sv_fixup.
 
 Then, 64bit sysent should also grow sv_fixup for resource limits, and
 set it accordingly for host ABI if array indicates that resource is
 logically 'infinite'.
 
 For completeness, I should note that bit is cleared if syscall sets
 the resource to non-infinite value. Per-struct rlimit means that there
 is a bit for each resource.
 
 Is it clear now ?
 
 --V8ijD2GsuVnOuiV5
 Content-Type: application/pgp-signature
 Content-Disposition: inline
 
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (FreeBSD)
 
 iEYEARECAAYFAlAi4ucACgkQC3+MBN1Mb4grKACg01g2AphuVQdC389JCrfSck+x
 5xIAoMuYfuQ4aKvCgcKShvGM4b2ftkVn
 =q/lG
 -----END PGP SIGNATURE-----
 
 --V8ijD2GsuVnOuiV5--

From: Ming Qiao <mqiao@juniper.net>
To: Konstantin Belousov <konstantin.belousov@zoral.com.ua>
Cc: Erin MacNeil <emacneil@juniper.net>, "freebsd-gnats-submit@freebsd.org"
	<freebsd-gnats-submit@freebsd.org>
Subject: RE: amd64/170351: [patch] amd64: 64-bit process can't always get
 unlimited rlimit
Date: Thu, 9 Aug 2012 16:16:38 -0400

 Thanks for the explanation. I'll prepare a fix and send it to you for revie=
 w
 when it's ready.
 
 ...Ming
 
>Unformatted:
