From nobody@FreeBSD.org  Fri Jun  9 06:18:58 2006
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 0FE7816A473
	for <freebsd-gnats-submit@FreeBSD.org>; Fri,  9 Jun 2006 06:18:58 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id CCCCD43D78
	for <freebsd-gnats-submit@FreeBSD.org>; Fri,  9 Jun 2006 06:18:57 +0000 (GMT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.13.1/8.13.1) with ESMTP id k596Iv5M031633
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 9 Jun 2006 06:18:57 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.13.1/8.13.1/Submit) id k596IvWO031632;
	Fri, 9 Jun 2006 06:18:57 GMT
	(envelope-from nobody)
Message-Id: <200606090618.k596IvWO031632@www.freebsd.org>
Date: Fri, 9 Jun 2006 06:18:57 GMT
From: Kouji Ito <kouji@cty-net.ne.jp>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [PATCH] if_bge.c Fail to collect link-status of bge interface.
X-Send-Pr-Version: www-2.3

>Number:         98738
>Category:       kern
>Synopsis:       [bge] [patch] if_bge.c Fail to collect link-status of bge interface.
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    oleg
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Jun 09 06:20:17 GMT 2006
>Closed-Date:    Thu Sep 07 10:35:35 GMT 2006
>Last-Modified:  Thu Sep 07 10:35:35 GMT 2006
>Originator:     Kouji Ito
>Release:        7.0-CURRENT (2006-06-07 JST), 5.5-RELEASE
>Organization:
Japan
>Environment:
7.0-CURRENT (2006-06-07 JST)
5.5-RELEASE (option SMP)
Probably 6.1-RELEASE are the same, too.

>Description:
When plural processes collect link status at the same time, fail.

When Creating Clusters of FreeBSD computers,then problem is effected.
"LinkStatus" of NIC is well used by Clusters System Control.
>How-To-Repeat:
(Condition 1) Use the kernel which added "option SMP".
(Condition 2) Use the SMP Machine.(or Intel Pentium4 with HTT)
(Condition 3) plural processes collect link status at the same time.

TEST1
Run this script in parallel.
#!/bin/csh
while 1
   set aaa=`ifconfig bge0 | grep active | wc -l`
   if ($aaa == 1) then
      echo "bge0 Link OK"
   else
      echo "bge0 Link NG"
      exit
   endif
end

Wait a few minutes.
Can see "bge Link NG" message.


>Fix:
Use this patch or Use non-SMP kernel.

I hope this patch is commited,
Thank you very much.

__FBSDID("$FreeBSD: src/sys/dev/bge/if_bge.c,v 1.131 2006/06/08 10:19:16 glebius Exp $")

*** if_bge.c.orig       Fri Jun  9 14:34:29 2006
--- if_bge.c    Fri Jun  9 14:34:58 2006
***************
*** 3325,3330 ****
--- 3325,3331 ----
                break;
        case SIOCSIFMEDIA:
        case SIOCGIFMEDIA:
+               BGE_LOCK(sc);
                if (sc->bge_tbi) {
                        error = ifmedia_ioctl(ifp, ifr,
                            &sc->bge_ifmedia, command);
***************
*** 3333,3338 ****
--- 3334,3340 ----
                        error = ifmedia_ioctl(ifp, ifr,
                            &mii->mii_media, command);
                }
+               BGE_UNLOCK(sc);
                break;
        case SIOCSIFCAP:
                mask = ifr->ifr_reqcap ^ ifp->if_capenable;

>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->oleg 
Responsible-Changed-By: glebius 
Responsible-Changed-When: Fri Aug 11 13:17:55 UTC 2006 
Responsible-Changed-Why:  
Very close to the area where Oleg works in bge(4). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=98738 

From: Oleg Bulyzhin <oleg@freebsd.org>
To: bug-followup@freebsd.org
Cc: kouji@cty-net.ne.jp
Subject: Re: kern/98738: [bge] [patch] if_bge.c Fail to collect link-status of bge interface.
Date: Fri, 18 Aug 2006 12:29:13 +0400

 --PNTmBPCT7hxwcZjr
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 
 
 I've found ifmedia callbacks was not properly locked, which could lead to ugly
 races on MII layer.
 Could you please test (and report results) attached patch?
 
 -- 
 Oleg.
 
 
 --PNTmBPCT7hxwcZjr
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="bge_lock_ifmcallbacks.diff"
 
 Index: if_bge.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/dev/bge/if_bge.c,v
 retrieving revision 1.137
 diff -u -r1.137 if_bge.c
 --- if_bge.c	17 Aug 2006 09:53:04 -0000	1.137
 +++ if_bge.c	18 Aug 2006 06:41:18 -0000
 @@ -3226,17 +3226,19 @@
  static int
  bge_ifmedia_upd(struct ifnet *ifp)
  {
 -	struct bge_softc *sc;
 +	struct bge_softc *sc = ifp->if_softc;
  	struct mii_data *mii;
  	struct ifmedia *ifm;
  
 -	sc = ifp->if_softc;
 +	BGE_LOCK(sc);
  	ifm = &sc->bge_ifmedia;
  
  	/* If this is a 1000baseX NIC, enable the TBI port. */
  	if (sc->bge_tbi) {
 -		if (IFM_TYPE(ifm->ifm_media) != IFM_ETHER)
 +		if (IFM_TYPE(ifm->ifm_media) != IFM_ETHER) {
 +			BGE_UNLOCK(sc);
  			return (EINVAL);
 +		}
  		switch(IFM_SUBTYPE(ifm->ifm_media)) {
  		case IFM_AUTO:
  			/*
 @@ -3268,8 +3270,10 @@
  			}
  			break;
  		default:
 +			BGE_UNLOCK(sc);
  			return (EINVAL);
  		}
 +		BGE_UNLOCK(sc);
  		return (0);
  	}
  
 @@ -3283,6 +3287,7 @@
  	}
  	mii_mediachg(mii);
  
 +	BGE_UNLOCK(sc);
  	return (0);
  }
  
 @@ -3292,11 +3297,10 @@
  static void
  bge_ifmedia_sts(struct ifnet *ifp, struct ifmediareq *ifmr)
  {
 -	struct bge_softc *sc;
 +	struct bge_softc *sc = ifp->if_softc;
  	struct mii_data *mii;
  
 -	sc = ifp->if_softc;
 -
 +	BGE_LOCK(sc);
  	if (sc->bge_tbi) {
  		ifmr->ifm_status = IFM_AVALID;
  		ifmr->ifm_active = IFM_ETHER;
 @@ -3305,6 +3309,7 @@
  			ifmr->ifm_status |= IFM_ACTIVE;
  		else {
  			ifmr->ifm_active |= IFM_NONE;
 +			BGE_UNLOCK(sc);
  			return;
  		}
  		ifmr->ifm_active |= IFM_1000_SX;
 @@ -3312,6 +3317,7 @@
  			ifmr->ifm_active |= IFM_HDX;
  		else
  			ifmr->ifm_active |= IFM_FDX;
 +		BGE_UNLOCK(sc);
  		return;
  	}
  
 @@ -3319,6 +3325,8 @@
  	mii_pollstat(mii);
  	ifmr->ifm_active = mii->mii_media_active;
  	ifmr->ifm_status = mii->mii_media_status;
 +
 +	BGE_UNLOCK(sc);
  }
  
  static int
 
 --PNTmBPCT7hxwcZjr--
State-Changed-From-To: open->feedback 
State-Changed-By: oleg 
State-Changed-When: Fri Aug 18 11:12:09 UTC 2006 
State-Changed-Why:  
Patch has been sent, waiting for test results. 


http://www.freebsd.org/cgi/query-pr.cgi?pr=98738 

From: Kouji Ito <kouji@cty-net.ne.jp>
To: bug-followup@freebsd.org
Cc: Oleg Bulyzhin <oleg@freebsd.org>,  kouji@cty-net.ne.jp
Subject: Re: kern/98738: [bge] [patch] if_bge.c Fail to collect link-status
 of bge interface.
Date: Mon, 21 Aug 2006 17:46:54 +0900

 I kept you waiting.
 I tested this patch.
 A result is good.
 
 The environment that I tested is as follows.
 
 Machine: HP DL380G3  Dual CPU (Intel Xeon 2.8G x 2)
 OS; FreeBSD5.5-RELEASE(SMP kernel) + patch
 
 A version of if_bge.c of 5.5R is 1.72.2.16.
 I applied a patch by manual operation.
 
 Even CURRENT thought that this patch worked definitely,
 but had better test it for CURRENT?
 
 TEST1.
 Run this script in parallel.
 
 #!/bin/csh
 while 1
    set aaa=`ifconfig bge0 | grep active | wc -l`
    if ($aaa == 1) then
       echo "bge0 Link OK"
    else
       echo "bge0 Link NG"
       exit
    endif
 end
 
 RESULT1.
 Good.
 I was able to collect a link state definitely.
 There was not the misrecognition.
 
 TEST2.
 Run this program in parallel.(TEST2 is the same as TEST1, but is severer.)
 
 #include <stdio.h>
 #include <sys/socket.h>
 #include <errno.h>
 #include <sys/ioctl.h>
 #include <net/if.h>
 #include <net/if_media.h>
 int link_status(char *adev);
 int main(int argc, char *argv[])
 {
    int nroute = 1;
    int res;
    int nloop = 0;
    int check_root = 0;
 
    if (3 != argc) {
       fprintf(stderr, "Usage %s count if_name\n", argv[0]);
       exit(0);
    }
    nloop = atoi(argv[1]);
 
    while(nloop--) {
       res = link_status(argv[2]);
       printf("nloop=%d res = %d\n", nloop, res);
       if (0 == res) {
          ;
       }else{
          exit(0);
       }
    }
 }
 int link_status(char *adev)
 {
    struct ifreq ifr;
    struct ifmediareq ifmr;
    int s, nrc;
 
    ifr.ifr_addr.sa_family = AF_INET;
    s = socket(ifr.ifr_addr.sa_family, SOCK_DGRAM, 0);
    if (0 > s) {
       printf("socket error errno=%d\n", errno);
       return -1;
    }
 
    memset(&ifmr, 0, sizeof(ifmr));
    strncpy(ifmr.ifm_name, adev, sizeof(ifmr.ifm_name));
    nrc = ioctl(s, SIOCGIFMEDIA, (caddr_t)&ifmr);
    if (0 > nrc) {
       close(s);
       printf("ioctl error errno=%d\n", errno);
       return -2;
    }
    close(s);
 
    if (ifmr.ifm_status & IFM_ACTIVE) {
       return 0; /* active     */
    }else{
       printf("no carrier\n");
       printf("ifmr.ifm_status = 0x%08x\n", ifmr.ifm_status);
       return 1; /* no carrier */
   }
 }
 
 RESULT2.
 Good.
 I was able to collect a link state definitely.
 There was not the misrecognition.
State-Changed-From-To: feedback->patched 
State-Changed-By: oleg 
State-Changed-When: Fri Aug 25 00:24:33 UTC 2006 
State-Changed-Why:  
Patch has been commited to HEAD, will be MFCed in 2 weeks. 
Thank you for your report and your testing! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=98738 
State-Changed-From-To: patched->closed 
State-Changed-By: oleg 
State-Changed-When: Thu Sep 7 10:34:57 UTC 2006 
State-Changed-Why:  
Commited to RELENG_6. 


http://www.freebsd.org/cgi/query-pr.cgi?pr=98738 
>Unformatted:
