From nobody@FreeBSD.org  Thu Jan 13 16:11:25 2005
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6770B16A4CF
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 13 Jan 2005 16:11:25 +0000 (GMT)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 28DAA43D5D
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 13 Jan 2005 16:11:25 +0000 (GMT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.13.1/8.13.1) with ESMTP id j0DGBOMW012074
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 13 Jan 2005 16:11:24 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.13.1/8.13.1/Submit) id j0DGBOlv012072;
	Thu, 13 Jan 2005 16:11:24 GMT
	(envelope-from nobody)
Message-Id: <200501131611.j0DGBOlv012072@www.freebsd.org>
Date: Thu, 13 Jan 2005 16:11:24 GMT
From: Eugene Stark <stark@cs.sunysb.edu>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Null pointer dereference in xl_detach()
X-Send-Pr-Version: www-2.3

>Number:         76207
>Category:       kern
>Synopsis:       [xl] [patch] Null pointer dereference in xl_detach()
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Jan 13 16:20:07 GMT 2005
>Closed-Date:    Tue Jun 12 05:03:49 GMT 2007
>Last-Modified:  Tue Jun 12 05:03:49 GMT 2007
>Originator:     Eugene Stark
>Release:        4.10-RELEASE
>Organization:
SUNY at Stony Brook
>Environment:
FreeBSD laptop.starkhome.cs.sunysb.edu 4.10-RELEASE-p4 FreeBSD 4.10-RELEASE-p4 #0: Thu Jan 13 10:47:59 EST 2005     gene@laptop.starkhome.cs.sunysb.edu:/A/src/sys/compile/LAPTOP  i386

>Description:
      My HP Omnibook 6000 uses the if_xl driver for the integrated
ethernet port.  Upon resuming from suspend-to-disk, sometimes the xl
driver reports "No memory for list buffers" and then crashes with a
null pointer dereference in generic_bzero() called from xl_detach+0x56.

      Tracing the problem shows that xl_attach() is called, but for
some reason allocation of the tx list buffers fails (this is the real
underlying problem, but I don't know why it occurs) and then xl_detach()
is called from the "fail" label in xl_attach().  The xl_detach()
function then attempts to bzero rx and tx list buffers without
checking whether they are NULL, and this causes the crash.

      At the very least, xl_detach() should check whether there are
any buffers before trying to bzero() them.  I have applied a patch that
does this as a workaround to keep the system from panicking when this
occurs.  As long as the system doesn't panic, one can do a suspend-to-memory and then unsuspend to recover from the problem and
get the xl driver working again.

       Various versions of this problem have been present for a number
of FreeBSD releases.  Between 4.8 and 4.10 there were some changes made
to the xl driver that seemed to address part of this, but they didn't
go far enough to avoid the null pointer dereferences and the problem still occurs.  I'd appreciate it if the workaround I mention below could
get in the tree so I don't have to keep applying this patch.



>How-To-Repeat:
      Get an HP Omnibook 6000, load the if_xl driver, suspend-to-disk,
then unsuspend and sometimes this will occur.

>Fix:
      I applied the following patch (diffs against 4.10-RELEASE) as a
workaround, pending identification of the real underlying problem.
It is probably not a bad idea to leave this patch in anyway as good
defensive programming.

      Well, OK, it says I'm not supposed to submit code here,
but whaddayawant?  Would be nice to provide an upload link so that
if I got a patch I can supply it.

Index: if_xl.c
===================================================================
RCS file: /A/cvs/src/sys/pci/if_xl.c,v
retrieving revision 1.72.2.29
diff -c -r1.72.2.29 if_xl.c
*** if_xl.c     19 Mar 2004 23:21:05 -0000      1.72.2.29
--- if_xl.c     13 Jan 2005 15:40:33 -0000
***************
*** 3274,3280 ****
                        sc->xl_cdata.xl_rx_chain[i].xl_mbuf = NULL;
                }
        }
!       bzero(sc->xl_ldata.xl_rx_list, XL_RX_LIST_SZ);
        /*
         * Free the TX list buffers.
         */
--- 3274,3285 ----
                        sc->xl_cdata.xl_rx_chain[i].xl_mbuf = NULL;
                }
        }
!       if(sc->xl_ldata.xl_rx_list != NULL)
!         bzero(sc->xl_ldata.xl_rx_list, XL_RX_LIST_SZ);
!       else
!         printf("xl%d: xl_ldata.xl_rx_list == NULL in xl_stop!\n",
!                sc->xl_unit);
! 
        /*
         * Free the TX list buffers.
         */
***************
*** 3288,3294 ****
                        sc->xl_cdata.xl_tx_chain[i].xl_mbuf = NULL;
                }
        }
!       bzero(sc->xl_ldata.xl_tx_list, XL_TX_LIST_SZ);
  
        ifp->if_flags &= ~(IFF_RUNNING | IFF_OACTIVE);
  
--- 3293,3303 ----
                        sc->xl_cdata.xl_tx_chain[i].xl_mbuf = NULL;
                }
        }
!       if(sc->xl_ldata.xl_tx_list != NULL)
!         bzero(sc->xl_ldata.xl_tx_list, XL_TX_LIST_SZ);
!       else
!         printf("xl%d: xl_ldata.xl_tx_list == NULL in xl_stop!\n",
!                sc->xl_unit);
  
        ifp->if_flags &= ~(IFF_RUNNING | IFF_OACTIVE);
  

>Release-Note:
>Audit-Trail:
Adding to audit trail from misfiled PR 76723:

 The patch I proposed in my original report exposed another bug
 in net/bpf.c, which I reported in kern/76410.  However, after making
 that additional patch, I found that there were still issues with if_xl.c,
 in that under the circumstances I described it would incorrectly called
 ether_ifdetach() without previously having called ether_ifattach().
 Examination of the code for ether_ifdetach() shows that it is not in
 any way, shape, or form intended to be called if ether_ifattach() had
 not previously been completed successfully.
 
 So, I added an additional flag to the if_xl driver to record whether
 ether_ifattach() was performed during initialization, so that calling
 ether_ifdetach() could be avoided during the failure unwind.
 This patch now successfully avoids panicking my laptop when the
 "cannot allocate memory for list buffers" condition occurs.
 Here are diffs for the full patch:
 
 
 Index: if_xl.c
 ===================================================================
 RCS file: /A/cvs/src/sys/pci/if_xl.c,v
 retrieving revision 1.72.2.29
 diff -c -r1.72.2.29 if_xl.c
 *** if_xl.c	19 Mar 2004 23:21:05 -0000	1.72.2.29
 --- if_xl.c	19 Jan 2005 14:00:58 -0000
 ***************
 *** 1770,1775 ****
 --- 1770,1776 ----
   	 * Call MI attach routine.
   	 */
   	ether_ifattach(ifp, ETHER_BPF_SUPPORTED);
 + 	sc->xl_ether_ifattached = 1;
   
           /*
            * Tell the upper layer(s) we support long frames.
 ***************
 *** 1825,1831 ****
   
   	xl_reset(sc);
   	xl_stop(sc);
 ! 	ether_ifdetach(ifp, ETHER_BPF_SUPPORTED);
   	
   	if (sc->xl_miibus)
   		device_delete_child(dev, sc->xl_miibus);
 --- 1826,1833 ----
   
   	xl_reset(sc);
   	xl_stop(sc);
 ! 	if (sc->xl_ether_ifattached)
 ! 		ether_ifdetach(ifp, ETHER_BPF_SUPPORTED);
   	
   	if (sc->xl_miibus)
   		device_delete_child(dev, sc->xl_miibus);
 ***************
 *** 3274,3280 ****
   			sc->xl_cdata.xl_rx_chain[i].xl_mbuf = NULL;
   		}
   	}
 ! 	bzero(sc->xl_ldata.xl_rx_list, XL_RX_LIST_SZ);
   	/*
   	 * Free the TX list buffers.
   	 */
 --- 3276,3287 ----
   			sc->xl_cdata.xl_rx_chain[i].xl_mbuf = NULL;
   		}
   	}
 ! 	if(sc->xl_ldata.xl_rx_list != NULL)
 ! 	  bzero(sc->xl_ldata.xl_rx_list, XL_RX_LIST_SZ);
 ! 	else
 ! 	  printf("xl%d: xl_ldata.xl_rx_list == NULL in xl_stop!\n",
 ! 		 sc->xl_unit);
 ! 
   	/*
   	 * Free the TX list buffers.
   	 */
 ***************
 *** 3288,3294 ****
   			sc->xl_cdata.xl_tx_chain[i].xl_mbuf = NULL;
   		}
   	}
 ! 	bzero(sc->xl_ldata.xl_tx_list, XL_TX_LIST_SZ);
   
   	ifp->if_flags &= ~(IFF_RUNNING | IFF_OACTIVE);
   
 --- 3295,3305 ----
   			sc->xl_cdata.xl_tx_chain[i].xl_mbuf = NULL;
   		}
   	}
 ! 	if(sc->xl_ldata.xl_tx_list != NULL)
 ! 	  bzero(sc->xl_ldata.xl_tx_list, XL_TX_LIST_SZ);
 ! 	else
 ! 	  printf("xl%d: xl_ldata.xl_tx_list == NULL in xl_stop!\n",
 ! 		 sc->xl_unit);
   
   	ifp->if_flags &= ~(IFF_RUNNING | IFF_OACTIVE);
   
 
 
 Index: if_xlreg.h
 ===================================================================
 RCS file: /A/cvs/src/sys/pci/if_xlreg.h,v
 retrieving revision 1.25.2.8
 diff -r1.25.2.8 if_xlreg.h
 607a608
 > 	int			xl_ether_ifattached;
State-Changed-From-To: open->feedback 
State-Changed-By: glebius 
State-Changed-When: Tue Dec 27 09:06:55 UTC 2005 
State-Changed-Why:  
Driver detach techniques has slightly changed in 6.0-RELEASE. Can you 
please check whether problem is still here in 6.0-RELEASE? 

http://www.freebsd.org/cgi/query-pr.cgi?pr=76207 

From: Gleb Smirnoff <glebius@freebsd.org>
To: Gene Stark <gene@starkhome.cs.sunysb.edu>
Cc: bug-followup@freebsd.org
Subject: Re: kern/76207: [xl] [patch] Null pointer dereference in xl_detach()
Date: Tue, 27 Dec 2005 15:22:34 +0300

   Gene,
 
 On Tue, Dec 27, 2005 at 07:13:25AM -0500, Gene Stark wrote:
 G>  > On Tue, Dec 27, 2005 at 05:46:47AM -0500, Gene Stark wrote:
 G>  > G> I'm not prepared at this time to replace the existing system on
 G>  > G> my laptop with 6.0-RELEASE, so I'm afraid I won't be able to provide the
 G>  > G> requested feedback.
 G>  > 
 G>  > Is this a temporary or not? If it is, then the PR will remain in its
 G>  > current state, waiting for you to reply in future. If you don't have
 G>  > the hardware or can't check this for any other reason, then the PR
 G>  > should be closed. So, what should I do with the PR?
 G> 
 G> Was any modification ever made to *any* branch as a result of this PR?
 G> 
 G> Look, I spent time on debugging this particular problem because it
 G> was important to me (it was crashing my laptop at extremely
 G> inopportune times, like when I was about to give a presentation or
 G> something).  I sent in the PR which contained workarounds, which I
 G> have been using since then, at least on the referenced FreeBSD version
 G> and maybe through a system update.
 G> 
 G> My hopes in sending in these PRs was to get somebody authoritative
 G> (like the author or maintainer of the driver) to look at the problem
 G> and hopefully make a suitable change.  My personal assessment after
 G> working on this particular problem was that the attach/detach
 G> sequence for this driver was very crufty in terms of keeping track
 G> of what had been initialized and what hadn't, and what needed to be
 G> freed and then reallocated.  Anybody knowledgeable who reads my PR
 G> and studies the driver code a little can see what I am talking about.
 G> I didn't have time to rewrite the driver and even if I did, I don't have
 G> the energy to argue with the maintainers about what would be the "proper"
 G> fix, coding conventions, etc., etc.  That's one reason why I didn't commit
 G> much when I was a committer.
 G> 
 G> Basically what I'm saying is:  I sent the PR because the code needed
 G> to be looked at and rewritten.  If somebody has done that on or before
 G> 6.0-RELEASE, well OK, I'll face up to it when and if I update my
 G> laptop to that version.  Hopefully it will "just work" at that point.
 G> If the same problem is still there, then I'll just be upset that
 G> nobody bothered to look at it and I'll be less likely to send in
 G> future PRs.
 
 As far as I know no commits were made, that reference your PR. However, the
 we had some general problems with detaching interfaces, and many of them
 were addressed before 6.0-RELEASE. This touched quite a lot of drivers,
 including xl(4).
 
 I'm now working on another xl(4) related PRs and I have acquired a 3Com
 PC card to work on them. Meanwhile, I decided to go thru other xl(4) PRs.
 And found yours. I see that on my notebook, on FreeBSD 7.0-CURRENT
 the card I have now, detaches and attaches without panic:
 
 cardbus1: Resource not specified in CIS: id=18, size=80
 xl0: <3Com 3c575C Fast Etherlink XL> port 0x1080-0x10ff mem 0x88000000-0x8800007f,0x88001000-0x8800107f irq 11 at device 0.0 on cardbus1
 miibus1: <MII bus> on xl0
 tdkphy0: <TDK 78Q2120 media interface> on miibus1
 tdkphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 xl0: Ethernet address: 00:00:86:3d:b2:d9
 Trying to mount root from ufs:/dev/ad0s1a
 arp: 81.19.64.118 moved from 00:04:76:3b:6d:df to 00:90:27:85:b8:b9 on fxp0
 xl0: reset didn't complete
 xl0: command never completed!
 xl0: command never completed!
 xl0: command never completed!
 tdkphy0: detached
 miibus1: detached
 xl0: detached
 cardbus1: Resource not specified in CIS: id=14, size=80
 cardbus1: Resource not specified in CIS: id=18, size=80
 xl0: <3Com 3c575C Fast Etherlink XL> port 0x1080-0x10ff mem 0x88000000-0x8800007f,0x88001000-0x8800107f irq 11 at device 0.0 on cardbus1
 miibus1: <MII bus> on xl0
 tdkphy0: <TDK 78Q2120 media interface> on miibus1
 tdkphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 xl0: Ethernet address: 00:00:86:3d:b2:d9
 xl0: link state changed to DOWN
 
 Here I have just ejected it and pushed back.
 
 I'm interested whether your particular card with your particular notebook
 still has problem or not. I do not urge you to do this right now, you can
 do it whenever you want and can. But the correct state for this PR is
 "feedback" since there is a high probability that the problem is not
 existent in modern FreeBSD, and we need confirmation or refuse of this
 fact to decide what to do with this PR.
 
 -- 
 Totus tuus, Glebius.
 GLEBIUS-RIPN GLEB-RIPE
State-Changed-From-To: feedback->closed 
State-Changed-By: linimon 
State-Changed-When: Tue Jun 12 05:03:09 UTC 2007 
State-Changed-Why:  
Feedback timeout (> 1 year). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=76207 
>Unformatted:
