From nobody@FreeBSD.org  Tue Mar 25 17:17:42 2008
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5283C106566B
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 25 Mar 2008 17:17:42 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 4F7088FC20
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 25 Mar 2008 17:17:42 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.2/8.14.2) with ESMTP id m2PHHTjP033226
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 25 Mar 2008 17:17:29 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.2/8.14.1/Submit) id m2PHHT3E033225;
	Tue, 25 Mar 2008 17:17:29 GMT
	(envelope-from nobody)
Message-Id: <200803251717.m2PHHT3E033225@www.freebsd.org>
Date: Tue, 25 Mar 2008 17:17:29 GMT
From: Jari Kirma <kirma@cs.hut.fi>
To: freebsd-gnats-submit@FreeBSD.org
Subject: NULL pointer dereference in in_pcbdrop
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         122082
>Category:       kern
>Synopsis:       [tcp] NULL pointer dereference in in_pcbdrop
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    rwatson
>State:          feedback
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Mar 25 17:20:03 UTC 2008
>Closed-Date:    
>Last-Modified:  Sun Feb  1 18:50:03 UTC 2009
>Originator:     Jari Kirma
>Release:        7.0-STABLE
>Organization:
Helsinki University of Technology
>Environment:
FreeBSD kirma.fi 7.0-STABLE FreeBSD 7.0-STABLE #3: Fri Mar 14 12:01:00 EET 2008     root@example.com:/usr/obj/usr/src/sys/VULCAN  i386

>Description:
According to two kernel crash dumps, it dies on NULL pointer dereference
on in_pcbdrop. Specifically this occurs on LIST_REMOVE below:

		if (LIST_FIRST(&phd->phd_pcblist) == NULL) {
			LIST_REMOVE(phd, phd_hash);
			free(phd, M_PCB);
		}

Even more specifically, this occurs on "(elm)->field.le_prev =
LIST_NEXT((elm), field);" below:

#define LIST_REMOVE(elm, field) do {                                    \
        QMD_LIST_CHECK_NEXT(elm, field);                                \
        QMD_LIST_CHECK_PREV(elm, field);                                \
        if (LIST_NEXT((elm), field) != NULL)                            \
                LIST_NEXT((elm), field)->field.le_prev =                \
                    (elm)->field.le_prev;                               \
        *(elm)->field.le_prev = LIST_NEXT((elm), field);                \
        TRASHIT((elm)->field.le_next);                                  \
        TRASHIT((elm)->field.le_prev);                                  \
} while (0)

Obviously this occurs when phd->phd_hash->le_prev is NULL:

(kgdb) p *phd
$8 = {phd_hash = {le_next = 0x0, le_prev = 0x0}, phd_pcblist = {
    lh_first = 0x0}, phd_port = 0}

The backtrace is essentially the same for both "clean" crashes I've seen:

(kgdb) bt
#0  doadump () at pcpu.h:195
#1  0xc05536e1 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2  0xc05539b4 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:563
#3  0xc0767cf4 in trap_fatal (frame=0xe7135ba4, eva=0)
    at /usr/src/sys/i386/i386/trap.c:899
#4  0xc0767f44 in trap_pfault (frame=0xe7135ba4, usermode=0, eva=0)
    at /usr/src/sys/i386/i386/trap.c:812
#5  0xc07688aa in trap (frame=0xe7135ba4) at /usr/src/sys/i386/i386/trap.c:490
#6  0xc075025b in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7  0xc0600027 in in_pcbdrop (inp=0xcd4db0b4)
    at /usr/src/sys/netinet/in_pcb.c:758
#8  0xc066b53e in tcp_twclose (tw=0xcd1c8514, reuse=0)
    at /usr/src/sys/netinet/tcp_timewait.c:477
#9  0xc066b74e in tcp_tw_2msl_scan (reuse=0)
    at /usr/src/sys/netinet/tcp_timewait.c:644
#10 0xc066a31c in tcp_slowtimo () at /usr/src/sys/netinet/tcp_timer.c:129
#11 0xc059c19b in pfslowtimo (arg=0x0) at /usr/src/sys/kern/uipc_domain.c:459
#12 0xc05650b1 in softclock (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:274
#13 0xc0537b22 in ithread_loop (arg=0xc6bae6a0)
    at /usr/src/sys/kern/kern_intr.c:1036
#14 0xc05349e1 in fork_exit (callout=0xc0537981 <ithread_loop>, 
    arg=0xc6bae6a0, frame=0xe7135d38) at /usr/src/sys/kern/kern_fork.c:781
#15 0xc07502d0 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:205

The system experiencing this problem is a quad-core Intel system with
nvidia display adapter (and associated drivers).
>How-To-Repeat:
Seems to occur from time to time when using a web browser, probably when
HTTP server closes a socket from the other end, since the crash seems to
occur a moment after successfully loading a web page.
>Fix:


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-net 
Responsible-Changed-By: remko 
Responsible-Changed-When: Tue Mar 25 19:41:45 UTC 2008 
Responsible-Changed-Why:  
Might be networking related. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=122082 
State-Changed-From-To: open->feedback 
State-Changed-By: rwatson 
State-Changed-When: Fri Oct 31 12:52:03 UTC 2008 
State-Changed-Why:  
Could I ask you to confirm whether this still occurs with the most recent 
7-STABLE?  If so, could I then ask you to print *tw and *inp in the 
tcp_twclose frame, which might shed a bit more light on things. 

Thanks! 



Responsible-Changed-From-To: freebsd-net->rwatson 
Responsible-Changed-By: rwatson 
Responsible-Changed-When: Fri Oct 31 12:52:03 UTC 2008 
Responsible-Changed-Why:  
Could I ask you to confirm whether this still occurs with the most recent 
7-STABLE?  If so, could I then ask you to print *tw and *inp in the 
tcp_twclose frame, which might shed a bit more light on things. 

Thanks! 


http://www.freebsd.org/cgi/query-pr.cgi?pr=122082 

From: Jari Kirma <kirma@cs.hut.fi>
To: rwatson@FreeBSD.org
Cc: freebsd-net@FreeBSD.org
Subject: Re: kern/122082: [in_pcb] NULL pointer dereference in in_pcbdrop
Date: Fri, 31 Oct 2008 15:17:59 +0200 (EET)

 On Fri, 31 Oct 2008, rwatson@FreeBSD.org wrote:
 
 > Synopsis: [in_pcb] NULL pointer dereference in in_pcbdrop
 >
 > State-Changed-From-To: open->feedback
 > State-Changed-By: rwatson
 > State-Changed-When: Fri Oct 31 12:52:03 UTC 2008
 > State-Changed-Why:
 > Could I ask you to confirm whether this still occurs with the most recent
 > 7-STABLE?  If so, could I then ask you to print *tw and *inp in the
 > tcp_twclose frame, which might shed a bit more light on things.
 >
 > Thanks!
 
 Unfortunately I don't have this kind of spare machine to risk crashing, 
 although I try to look at it again. Still, if I remember correctly, the 
 system was agitated to crash very often (like over 25% of time after 
 bootup in some 15 first minutes): linux-opera (or even linux-firefox) 
 being used actively on a quad-core machine running relatively regular xfce 
 desktop. When the system survived first fifteen minutes or so, it seemed 
 to get less and less likely it would crash similarly, although it could 
 have happened weeks after that with very similar signs: crashing a moment 
 after browsing around, as if remote teardown of a persistent HTTP TCP 
 socket from the remote end would trigger it.
 
 I think the Linux emulation part in this context is somehow significant. 
 When I moved to native Firefox instead, at least similar crashes ceased to 
 occur. Might there be some locks not being held on Linux call path in 
 comparison to native versions?
 
 > Responsible-Changed-From-To: freebsd-net->rwatson
 > Responsible-Changed-By: rwatson
 > Responsible-Changed-When: Fri Oct 31 12:52:03 UTC 2008
 > Responsible-Changed-Why:
 > Could I ask you to confirm whether this still occurs with the most recent
 > 7-STABLE?  If so, could I then ask you to print *tw and *inp in the
 > tcp_twclose frame, which might shed a bit more light on things.
 >
 > Thanks!
 >
 >
 > http://www.freebsd.org/cgi/query-pr.cgi?pr=122082
 >
 
>Unformatted:
