From nobody@FreeBSD.org  Fri Oct 28 19:43:28 2011
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CE049106566B
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 28 Oct 2011 19:43:28 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id BDE1C8FC08
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 28 Oct 2011 19:43:28 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id p9SJhS5C011151
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 28 Oct 2011 19:43:28 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id p9SJhSqC011143;
	Fri, 28 Oct 2011 19:43:28 GMT
	(envelope-from nobody)
Message-Id: <201110281943.p9SJhSqC011143@red.freebsd.org>
Date: Fri, 28 Oct 2011 19:43:28 GMT
From: Frank Terhaar-Yonkers <fty@cisco.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Releng_9 panics on boot in IGB driver - regression from 8.2
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         162110
>Category:       kern
>Synopsis:       [igb] [panic] RELENG_9 panics on boot in IGB driver - [regression] from 8.2
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-net
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Oct 28 19:50:08 UTC 2011
>Closed-Date:    
>Last-Modified:  Mon Oct 31 19:40:07 UTC 2011
>Originator:     Frank Terhaar-Yonkers
>Release:        Releng_9 CVSUP 2011-October-28
>Organization:
Cisco
>Environment:
FreeBSD fty-zfs-01 9.0-RC1 FreeBSD 9.0-RC1 #1: Fri Oct 28 06:50:23 EDT 2011     toot@fty-zfs-01:/usr/obj/usr/src/sys/GENERIC  amd64
>Description:
if_igb driver panics during bootup.

The IGB driver probes the device at line 591 of if_igb.c and punts:
                if (e1000_validate_nvm_checksum(&adapter->hw) < 0) {
                        device_printf(dev,
                            "The EEPROM Checksum Is Not Valid\n");
                        error = EIO;
                        goto err_late;
                }

The kernel immediately panics with a page fault.  The trace-back show it's in the if_igb driver as the console messages suggest.

Releng_8 did not panic, so this is a regression.  The IGB NIC most likely has some sort of problem which is properly diagnosed.

Email me if you want the screen shot of the panic, or have a fix to try out.

>How-To-Repeat:
Crashes every time on boot.
>Fix:
Disabled compile of if_igb.c driver, system boots fine.

>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-net 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sat Oct 29 12:04:50 UTC 2011 
Responsible-Changed-Why:  
reclassify. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=162110 

From: Gleb Smirnoff <glebius@FreeBSD.org>
To: Frank Terhaar-Yonkers <fty@cisco.com>
Cc: freebsd-gnats-submit@FreeBSD.org, jfv@FreeBSD.org
Subject: Re: kern/162110: Releng_9 panics on boot in IGB driver - regression
 from 8.2
Date: Mon, 31 Oct 2011 22:37:28 +0300

 --LTeJQqWS0MN7I/qa
 Content-Type: text/plain; charset=koi8-r
 Content-Disposition: inline
 
 On Fri, Oct 28, 2011 at 07:43:28PM +0000, Frank Terhaar-Yonkers wrote:
 F> 
 F> >Number:         162110
 F> >Category:       kern
 F> >Synopsis:       Releng_9 panics on boot in IGB driver - regression from 8.2
 F> >Confidential:   no
 F> >Severity:       critical
 F> >Priority:       high
 F> >Responsible:    freebsd-bugs
 F> >State:          open
 F> >Quarter:        
 F> >Keywords:       
 F> >Date-Required:
 F> >Class:          sw-bug
 F> >Submitter-Id:   current-users
 F> >Arrival-Date:   Fri Oct 28 19:50:08 UTC 2011
 F> >Closed-Date:
 F> >Last-Modified:
 F> >Originator:     Frank Terhaar-Yonkers
 F> >Release:        Releng_9 CVSUP 2011-October-28
 F> >Organization:
 F> Cisco
 F> >Environment:
 F> FreeBSD fty-zfs-01 9.0-RC1 FreeBSD 9.0-RC1 #1: Fri Oct 28 06:50:23 EDT 2011     toot@fty-zfs-01:/usr/obj/usr/src/sys/GENERIC  amd64
 F> >Description:
 F> if_igb driver panics during bootup.
 F> 
 F> The IGB driver probes the device at line 591 of if_igb.c and punts:
 F>                 if (e1000_validate_nvm_checksum(&adapter->hw) < 0) {
 F>                         device_printf(dev,
 F>                             "The EEPROM Checksum Is Not Valid\n");
 F>                         error = EIO;
 F>                         goto err_late;
 F>                 }
 F> 
 F> The kernel immediately panics with a page fault.  The trace-back show it's in the if_igb driver as the console messages suggest.
 F> 
 F> Releng_8 did not panic, so this is a regression.  The IGB NIC most likely has some sort of problem which is properly diagnosed.
 F> 
 F> Email me if you want the screen shot of the panic, or have a fix to try out.
 
 To reproduce your problem, I've put '|| 1)' conditional into code quoted
 above. It appeared that calling igb_detach() in case of igb_attach() failure
 is full of landmines. Attached patch fixes lot of them, and at least kernel
 doesn't panic in case of e1000_validate_nvm_checksum() failure, not sure
 about other cases.
 
 Unfortunately patch will not fix your NIC, it only cures panic.
 
 I've put into Cc Jack Vogel, who is maintainer of the Intel NIC drivers
 in FreeBSD. May be he can help you.
 
 Jack, please consider including my patch into next version of driver.
 The issues fixed:
 
 - igb_detach() may be called with not initialized ifp
 - igb_stop() may be called with not initialized ifp
 - igb_detach() already does free transmit/receive structures
 - igb_detach() already does free adapter->mta
 - igb_detach() already does destroy core lock
 
 There are probably other edge cases, when kernel panics due to some failure
 in igb_attach(), not all possible error exits were tested.
 
 -- 
 Totus tuus, Glebius.
 
 --LTeJQqWS0MN7I/qa
 Content-Type: text/x-diff; charset=koi8-r
 Content-Disposition: attachment; filename="if_igb.c.diff"
 
 Index: if_igb.c
 ===================================================================
 --- if_igb.c	(revision 226966)
 +++ if_igb.c	(working copy)
 @@ -670,11 +670,12 @@
  
  err_late:
  	igb_detach(dev);
 -	igb_free_transmit_structures(adapter);
 -	igb_free_receive_structures(adapter);
  	igb_release_hw_control(adapter);
  	if (adapter->ifp != NULL)
  		if_free(adapter->ifp);
 +	igb_free_pci_resources(adapter);
 +	return (error);
 +
  err_pci:
  	igb_free_pci_resources(adapter);
  	free(adapter->mta, M_DEVBUF);
 @@ -701,26 +702,37 @@
  
  	INIT_DEBUGOUT("igb_detach: begin");
  
 -	/* Make sure VLANS are not using driver */
 -	if (adapter->ifp->if_vlantrunk != NULL) {
 -		device_printf(dev,"Vlan in use, detach first\n");
 -		return (EBUSY);
 -	}
 +	IGB_CORE_LOCK(adapter);
 +	adapter->in_detach = 1;
 +	igb_stop(adapter);
 +	IGB_CORE_UNLOCK(adapter);
  
 -	ether_ifdetach(adapter->ifp);
 +	/* Unregister VLAN events */
 +	if (adapter->vlan_attach != NULL)
 +		EVENTHANDLER_DEREGISTER(vlan_config, adapter->vlan_attach);
 +	if (adapter->vlan_detach != NULL)
 +		EVENTHANDLER_DEREGISTER(vlan_unconfig, adapter->vlan_detach);
  
 -	if (adapter->led_dev != NULL)
 -		led_destroy(adapter->led_dev);
 +	callout_drain(&adapter->timer);
  
 +	if (ifp != NULL) {
 +		/* Make sure VLANS are not using driver */
 +		if (ifp->if_vlantrunk != NULL) {
 +			device_printf(dev,"Vlan in use, detach first\n");
 +			return (EBUSY);
 +		}
 +
 +		ether_ifdetach(ifp);
 +
  #ifdef DEVICE_POLLING
 -	if (ifp->if_capenable & IFCAP_POLLING)
 -		ether_poll_deregister(ifp);
 +		if (ifp->if_capenable & IFCAP_POLLING)
 +			ether_poll_deregister(ifp);
  #endif
 +		if_free(ifp);
 +	}
  
 -	IGB_CORE_LOCK(adapter);
 -	adapter->in_detach = 1;
 -	igb_stop(adapter);
 -	IGB_CORE_UNLOCK(adapter);
 +	if (adapter->led_dev != NULL)
 +		led_destroy(adapter->led_dev);
  
  	e1000_phy_hw_reset(&adapter->hw);
  
 @@ -734,17 +746,8 @@
  		igb_enable_wakeup(dev);
  	}
  
 -	/* Unregister VLAN events */
 -	if (adapter->vlan_attach != NULL)
 -		EVENTHANDLER_DEREGISTER(vlan_config, adapter->vlan_attach);
 -	if (adapter->vlan_detach != NULL)
 -		EVENTHANDLER_DEREGISTER(vlan_unconfig, adapter->vlan_detach);
 -
 -	callout_drain(&adapter->timer);
 -
  	igb_free_pci_resources(adapter);
  	bus_generic_detach(dev);
 -	if_free(ifp);
  
  	igb_free_transmit_structures(adapter);
  	igb_free_receive_structures(adapter);
 @@ -2135,7 +2138,8 @@
  	callout_stop(&adapter->timer);
  
  	/* Tell the stack that the interface is no longer active */
 -	ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
 +	if (ifp != NULL)
 +		ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
  
  	/* Unarm watchdog timer. */
  	for (int i = 0; i < adapter->num_queues; i++, txr++) {
 
 --LTeJQqWS0MN7I/qa--
>Unformatted:
