From nobody@FreeBSD.org  Mon Aug 17 22:29:45 2009
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6B24A1065696
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 17 Aug 2009 22:29:45 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 5B4FE8FC55
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 17 Aug 2009 22:29:45 +0000 (UTC)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id n7HMTjrZ028833
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 17 Aug 2009 22:29:45 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id n7HMTiow028832;
	Mon, 17 Aug 2009 22:29:44 GMT
	(envelope-from nobody)
Message-Id: <200908172229.n7HMTiow028832@www.freebsd.org>
Date: Mon, 17 Aug 2009 22:29:44 GMT
From: Bruce Cran <bruce@cran.org.uk>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [libkvm] ps segfaults with -ax when inspecting core files
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         137890
>Category:       kern
>Synopsis:       [libkvm] [patch] ps segfaults with -ax when inspecting core files
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    brucec
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Aug 17 22:30:09 UTC 2009
>Closed-Date:    Sun Feb 28 14:10:46 UTC 2010
>Last-Modified:  Sun Feb 28 14:10:46 UTC 2010
>Originator:     Bruce Cran
>Release:        8.0-BETA2
>Organization:
>Environment:
FreeBSD tau.draftnet 8.0-BETA2 FreeBSD 8.0-BETA2 #0: Sun Aug 16 19:32:23 BST 2009     brucec@tau.draftnet:/usr/obj/usr/src/sys/DELL  amd64
>Description:
When recovering from a crash, crashinfo(8) is run; it executes 'ps -ax -M corefile' which causes ps to segfault and attempt to write a 1GB core file to /

The crash can be reproduced after the system has booted by running 'ps -ax -M /var/crash/vmcore.x'.  The faulty code appears to be in lib/libkvm/kvm_proc.c around line 561, though the underlying cause is that the symbol table appears to be unreadable (inferred from the -1 return value of kvm_nlist).

It seems it's stepping past the nlist array and calls
vsnprintf with a bad argument. kvm_nlist returns -1 to report that the
symbol table couldn't be read, but the code assumes it has returned a
positive number to indicate that there's an invalid entry, so it starts
searching for that entry where n_type is 0.

tau# gdb ps
GNU gdb 6.1.1 [FreeBSD]
[...]

(gdb) run -ax -M /var/crash/vmcore.3
Starting program: /bin/ps -ax -M /var/crash/vmcore.3

Program received signal SIGSEGV, Segmentation fault.
0x000000080096340b in strlen (str=Variable "str" is not available.
) at /usr/src/lib/libc/string/strlen.c:88
88		    if (*p == '\0')
(gdb) bt
#0  0x000000080096340b in strlen (str=Variable "str" is not available.
) at /usr/src/lib/libc/string/strlen.c:88
#1  0x000000080095c082 in __vfprintf (fp=0x7fffffffd9a0,
fmt0=0x800773915 "%s: no such symbol", ap=0x7fffffffdb10)
at /usr/src/lib/libc/stdio/vfprintf.c:825
#2  0x00000008008cc696 in
vsnprintf (str=Variable "str" is not available. )
at /usr/src/lib/libc/stdio/vsnprintf.c:70
#3  0x0000000800772e89 in
_kvm_err (kd=Variable "kd" is not available. )
at /usr/src/lib/libkvm/kvm.c:104 #4  0x0000000800770907 in kvm_getprocs
(kd=0x800b02300, op=8, arg=0, cnt=0x7fffffffdf1c)
at /usr/src/lib/libkvm/kvm_proc.c:561 #5  0x0000000000405322 in main
(argc=4, argv=0x7fffffffe9a8) at /usr/src/bin/ps/ps.c:511 (gdb) frame 4
#4  0x0000000800770907 in kvm_getprocs (kd=0x800b02300, op=8, arg=0,
cnt=0x7fffffffdf1c) at /usr/src/lib/libkvm/kvm_proc.c:561
561				_kvm_err(kd, kd->program, (gdb) list
556			nl[5].n_name = 0; 557	
558			if (kvm_nlist(kd, nl) != 0) {
559				for (p = nl; p->n_type != 0; ++p)
560					;
561				_kvm_err(kd, kd->program,
562					 "%s: no such symbol",
p->n_name); 563				return (0);
564			}
565			if (KREAD(kd, nl[0].n_value, &nprocs)) {
(gdb) print nl
$1 = {{n_name = 0x8007738ef "_nprocs", n_type = 240 '', n_other = -1
'', n_desc = -1, n_value = 34365215744}, { n_name = 0x8007738f7
"_allproc", n_type = 160 '', n_other = -100 '\234', n_desc = 80,
n_value = 0}, { n_name = 0x800773900 "_zombproc", n_type = 57 '9',
n_other = 2 '\002', n_desc = 81, n_value = 34367538496}, { n_name =
0x80077390a "_ticks", n_type = 74 'J', n_other = 0 '\0', n_desc = 0,
n_value = 34365215744}, { n_name = 0x800773911 "_hz", n_type = 168 '',
n_other = -23 '', n_desc = -1, n_value = 140737488349576}, {n_name =
0x0, n_type = 1 '\001', n_other = 0 '\0', n_desc = 0, n_value =
34365024109}} 
>How-To-Repeat:
Run 'ps -ax -M /var/crash/vmcore.x'
>Fix:


>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback 
State-Changed-By: gavin 
State-Changed-When: Tue Aug 18 13:14:10 UTC 2009 
State-Changed-Why:  
Can you try http://people.freebsd.org/~gavin/PRs/137890.diff ? 

The failing part is attempting to check that all symbols were found. 
Looking at the kvm_nproc manpage, the list returned by kvm_nlist is supposed 
to be terminated by "p->n_name == NULL", however this wasn't being checked. 
We were therefore wandering off the end of the list. 


Responsible-Changed-From-To: freebsd-bugs->gavin 
Responsible-Changed-By: gavin 
Responsible-Changed-When: Tue Aug 18 13:14:10 UTC 2009 
Responsible-Changed-Why:  
Track 

http://www.freebsd.org/cgi/query-pr.cgi?pr=137890 

From: Gavin Atkinson <gavin@FreeBSD.org>
To: bug-followup@FreeBSD.org, bruce@cran.org.uk
Cc:  
Subject: Re: kern/137890: [libkvm] [patch] ps segfaults with -ax when
	inspecting core files
Date: Tue, 18 Aug 2009 14:46:46 +0100

 Hmm, there may be more to this.  I'm pretty sure that patch is correct
 regardless, however it does appear that kvm_nlist() is returning !=0
 even though the structure returned seems to have been fully filled in.
 
 Can you add a printf to the code to determine what kvm_nlist() is
 returning?  It will be interesting to see if it is -1, or a positive
 integer.
 
 The patch at least fixes one bug and should prevent the core dump you
 are seeing.
 
 Gavin

From: Bruce Cran <bruce@cran.org.uk>
To: Gavin Atkinson <gavin@FreeBSD.org>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/137890: [libkvm] [patch] ps segfaults with -ax when
 inspecting core files
Date: Tue, 18 Aug 2009 15:22:17 +0100

 kvm_nlist is returning -1, which from the manpage indicates that it
 couldn't read the symbol table. But, the structure does seem to have
 been filled in.  I'll debug kvm_nlist itself to see why it's
 filling it in but not returning 0.
 
 -- 
 Bruce

From: Gavin Atkinson <gavin@FreeBSD.org>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/137890: [libkvm] [patch] ps segfaults with -ax when
	inspecting core files
Date: Tue, 18 Aug 2009 15:37:15 +0100

 On Tue, 2009-08-18 at 13:50 +0000, Gavin Atkinson wrote:
 >  Hmm, there may be more to this.  I'm pretty sure that patch is correct
 >  regardless, however it does appear that kvm_nlist() is returning !=0
 >  even though the structure returned seems to have been fully filled in.
 
 Ignore this, I don't think the structure has been filled in at all, and
 is instead just random contents of memory.
 
 I've created a new patch with slightly better error handling at
 http://people.freebsd.org/~gavin/PRs/137890.2.diff - please give that a
 go and see if it solves the coredump for you and properly fails with an
 error message.  FWIW, it looks like several other uses of kvm_nlist() in
 libkvm suffer the same bug with how they check the validity of the
 returned data.
 
 The root cause of why libkvm it is failing on your coredump is still
 unknown.
 
 Gavin

From: Bruce Cran <bruce@cran.org.uk>
To: bug-followup@FreeBSD.org, bruce@cran.org.uk
Cc:  
Subject: Re: kern/137890: [libkvm] [patch] ps segfaults with -ax when
 inspecting core files
Date: Sun, 23 Aug 2009 21:48:53 +0100

 --MP_/Hne4C+2LQmeLzp1pVRjPwT0
 Content-Type: text/plain; charset=US-ASCII
 Content-Transfer-Encoding: 7bit
 Content-Disposition: inline
 
 The attached patches fix the crash. 
 The first bug is that ps(1) passes "/dev/null" into kvm_open(3) instead
 of NULL.  The second problem is that the bcopy call fails in
 kvm_proc.c; it looks like it's because ucred.cr_groups is a kernel
 address, but without knowing the details of the code I can't be sure.
 Translating the address with KREAD stops the crash occurring, but may
 not be the correct solution.
 
 -- 
 Bruce
 --MP_/Hne4C+2LQmeLzp1pVRjPwT0
 Content-Type: text/plain
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment; filename=kvm_proc.c.diff.txt
 
 --- kvm_proc.c.orig	2009-08-03 09:13:06.000000000 +0100
 +++ kvm_proc.c	2009-08-23 20:37:26.000000000 +0100
 @@ -118,6 +118,7 @@
  	struct timeval tv;
  	struct sysentvec sysent;
  	char svname[KI_EMULNAMELEN];
 +	void *crg;
  
  	kp = &kinfo_proc;
  	kp->ki_structsize = sizeof(kinfo_proc);
 @@ -150,8 +151,14 @@
  				kp->ki_ngroups = KI_NGROUPS;
  				kp->ki_cr_flags |= KI_CRF_GRP_OVERFLOW;
  			}
 -				kp->ki_ngroups = ucred.cr_ngroups;
 -			bcopy(ucred.cr_groups, kp->ki_groups,
 +			kp->ki_ngroups = ucred.cr_ngroups;
 +			if (KREAD(kd, (u_long)ucred.cr_groups, &crg)) {
 +				_kvm_err(kd, kd->program, 
 +				    "can't read cr_groups at %p",
 +				    ucred.cr_groups);
 +				return (-1);
 +			}
 +			bcopy(&crg, kp->ki_groups,
  			    kp->ki_ngroups * sizeof(gid_t));
  			kp->ki_uid = ucred.cr_uid;
  			if (ucred.cr_prison != NULL) {
 @@ -472,7 +479,7 @@
  {
  	int mib[4], st, nprocs;
  	size_t size;
 -	int temp_op;
 +	int err, temp_op;
  
  	if (kd->procbase != 0) {
  		free((void *)kd->procbase);
 @@ -555,11 +562,16 @@
  		nl[4].n_name = "_hz";
  		nl[5].n_name = 0;
  
 -		if (kvm_nlist(kd, nl) != 0) {
 -			for (p = nl; p->n_type != 0; ++p)
 -				;
 +		err = kvm_nlist(kd, nl);
 +		if (err == -1) {
  			_kvm_err(kd, kd->program,
 -				 "%s: no such symbol", p->n_name);
 +			    "cannot read symbol table");
 +			return (0);
 +		} else if (err > 0) {
 +			for (p = nl; p->n_name != NULL; ++p)
 +				if (p->n_type == 0)
 +					_kvm_err(kd, kd->program,
 +					    "%s: no such symbol", p->n_name);
  			return (0);
  		}
  		if (KREAD(kd, nl[0].n_value, &nprocs)) {
 
 --MP_/Hne4C+2LQmeLzp1pVRjPwT0
 Content-Type: text/plain
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment; filename=ps.c.diff.txt
 
 --- /usr/src/bin/ps/ps.c	2009-08-03 09:13:06.000000000 +0100
 +++ ps.c	2009-08-22 21:03:56.000000000 +0100
 @@ -212,7 +212,8 @@
  	init_list(&sesslist, addelem_pid, sizeof(pid_t), "session id");
  	init_list(&ttylist, addelem_tty, sizeof(dev_t), "tty");
  	init_list(&uidlist, addelem_uid, sizeof(uid_t), "user");
 -	memf = nlistf = _PATH_DEVNULL;
 +	memf = _PATH_DEVNULL;
 +	nlistf = NULL;
  	while ((ch = getopt(argc, argv, PS_ARGS)) != -1)
  		switch (ch) {
  		case 'A':
 
 --MP_/Hne4C+2LQmeLzp1pVRjPwT0--

From: Bruce Cran <bruce@cran.org.uk>
To: bug-followup@FreeBSD.org, bruce@cran.org.uk
Cc:  
Subject: Re: kern/137890: [libkvm] [patch] ps segfaults with -ax when
 inspecting core files
Date: Mon, 24 Aug 2009 19:54:58 +0100

 Since gnats mangled the patches, I've uploaded copies to
 http://www.cran.org.uk/~brucec/freebsd/pr137890.kvm_proc.c.diff and
 http://www.cran.org.uk/~brucec/freebsd/pr137890.ps.c.diff
 
 -- 
 Bruce
State-Changed-From-To: feedback->analyzed 
State-Changed-By: gavin 
State-Changed-When: Tue Aug 25 09:40:25 UTC 2009 
State-Changed-Why:  
Mark as analysed, it seems that the problem is well understood, and  
the patches in the PR fix the issue 


Responsible-Changed-From-To: gavin->freebsd-bugs 
Responsible-Changed-By: gavin 
Responsible-Changed-When: Tue Aug 25 09:40:25 UTC 2009 
Responsible-Changed-Why:  
Back into the pool, in the hope it'll be picked up and committed before 8.0 

http://www.freebsd.org/cgi/query-pr.cgi?pr=137890 

From: Bruce Cran <bruce@cran.org.uk>
To: bug-followup@FreeBSD.org, bruce@cran.org.uk
Cc:  
Subject: Re: kern/137890: [libkvm] [patch] ps segfaults with -ax when
 inspecting core files
Date: Mon, 18 Jan 2010 12:45:16 +0000

 The libkvm bug has now been fixed, but the patch for bin/ps hasn't been
 committed yet.
 
 -- 
 Bruce
State-Changed-From-To: analyzed->patched  
State-Changed-By: brucec 
State-Changed-When: Mon Feb 8 21:44:36 UTC 2010 
State-Changed-Why:  
Fix has been checked in to -CURRENT. 


Responsible-Changed-From-To: freebsd-bugs->brucec  
Responsible-Changed-By: brucec 
Responsible-Changed-When: Mon Feb 8 21:44:36 UTC 2010 
Responsible-Changed-Why:  
Take 

http://www.freebsd.org/cgi/query-pr.cgi?pr=137890 
State-Changed-From-To: patched->closed  
State-Changed-By: brucec 
State-Changed-When: Sun Feb 28 14:10:19 UTC 2010 
State-Changed-Why:  
Fix has been merged to stable/7 and stable/8. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=137890 
>Unformatted:
