From nobody@FreeBSD.org  Fri Aug 13 04:51:42 2010
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 434D810656A3
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 13 Aug 2010 04:51:42 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 25DC28FC19
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 13 Aug 2010 04:51:42 +0000 (UTC)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id o7D4pfv7060506
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 13 Aug 2010 04:51:41 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id o7D4pfkN060505;
	Fri, 13 Aug 2010 04:51:41 GMT
	(envelope-from nobody)
Message-Id: <201008130451.o7D4pfkN060505@www.freebsd.org>
Date: Fri, 13 Aug 2010 04:51:41 GMT
From: Chris Luke <chrisy@flirble.org>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Deadlock with netinet6/raw_ip6.c when passing over a multicast ipv6 packet our raw socket is not interested in
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         149608
>Category:       kern
>Synopsis:       [ip6] [hang] Deadlock with netinet6/raw_ip6.c when passing over a multicast ipv6 packet our raw socket is not interested in
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    bz
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Aug 13 05:00:14 UTC 2010
>Closed-Date:    Tue Aug 17 08:02:47 UTC 2010
>Last-Modified:  Tue Aug 17 08:02:47 UTC 2010
>Originator:     Chris Luke
>Release:        8.1, 8.0
>Organization:
>Environment:
FreeBSD castaway.xxx 8.0-RELEASE-p3 FreeBSD 8.0-RELEASE-p3 #1: Thu May 27 13:15:32 EDT 2010     root@castaway.xxx:/usr/src/sys/i386/compile/Castaway  i386

FreeBSD chestnut.xxx 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Thu Aug 12 11:40:49 EDT 2010     root@chestnut.xxx:/usr/src/sys/i386/compile/Chestnut  i386

Also occured on GENERIC kernel. 8.1 kernel above is the one I patched to cure this issue.
>Description:
Observed with Quagga and Bird routing daemons running OSPFv3 over tap(4) based tunnels.

Process would hang repeatably and within a few seconds. ps or top would indicate process was waiting in a kernel lock with name "rawinp". It never recovers and the process cannot be killed. reboot is the only cure.

> ps alxww | grep ospf6d
  101 15165     1   0  44  0  2760  2104 rawinp Ls    ??    0:01.84 /usr/local/sbin/ospf6d -d

Most of the time the deadlock appears to hang only the process I was observing, however, 1 in 10 occasions the entire system would hang.
>How-To-Repeat:
Anytime I run either of Quagga or Bird they would deadlock quickly.

Based on my analysis, it would require at least one raw socket in INET6 and for the stack to have joined at least one IPv6 multicast group that at least one raw socket has not also joined, and then for a packet to arrive for that group.

It is noteworthy that non-root processes can create IPv6 multicast sockets. Thus, if an IPv6 raw socket already exists (there are many valid reasons) then a non-root user can cause a system deadlock by simply joining a multicast group, even if they raw socket does not participate in any multicast.

Also, since various fundamental IPv6 mechanisms use multicast, it seems likely this is the reason I observed complete system hangs - for example.

I am not sure if there is something racy with running OSPFv3 over tap tunnels (neighbor discovery timing, perhaps), but it has reliably deadlocked on me since 8.0-RELEASE. I assumed it was immature user-land code, but decided to look at the kernel instead.
>Fix:
Reviewing rip6_input in raw_ip6.c, I noted that in the pcb loop when a multicast datagram is skipped over

    if (blocked != MCAST_PASS) {
        IP6STAT_INC(ip6s_notmember);
        continue;
    }

then the in6p does not get INP_RUNLOCK()'ed. This is normally performed by leaving the last in6p in 'last' and it gets mopped up next time round the loop. The multicast code is 'continue'd before the current in6p ever gets into 'last', and thus never unlocked.

The fix I have successfully tested is to add

    INP_UNLOCK(in6p);

before the continue. See the attached patch against 8.1-RELEASE.

Patch attached with submission follows:

*** raw_ip6.c-original	Fri Aug 13 00:33:56 2010
--- raw_ip6.c	Fri Aug 13 00:34:10 2010
***************
*** 248,253 ****
--- 248,254 ----
  			}
  			if (blocked != MCAST_PASS) {
  				IP6STAT_INC(ip6s_notmember);
+ 				INP_RUNLOCK(in6p);
  				continue;
  			}
  		}



>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-net 
Responsible-Changed-By: brucec 
Responsible-Changed-When: Fri Aug 13 15:27:58 UTC 2010 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=149608 
Responsible-Changed-From-To: freebsd-net->bz 
Responsible-Changed-By: bz 
Responsible-Changed-When: Sat Aug 14 14:02:29 UTC 2010 
Responsible-Changed-Why:  
Take. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=149608 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/149608: commit references a PR
Date: Sat, 14 Aug 2010 14:13:55 +0000 (UTC)

 Author: bz
 Date: Sat Aug 14 14:13:44 2010
 New Revision: 211301
 URL: http://svn.freebsd.org/changeset/base/211301
 
 Log:
   In rip6_input(), in case of multicast, we might skip the normal processing
   and go to the next iteration early if multicast filtering would decide that
   this socket shall not receive the data.
   Unlock the pcb in that case or we leak the read lock and next time trying
   to get a write lock, would hang forever.
   
   PR:		kern/149608
   Submitted by:	Chris Luke (chrisy flirble.org)
   MFC after:	3 days
 
 Modified:
   head/sys/netinet6/raw_ip6.c
 
 Modified: head/sys/netinet6/raw_ip6.c
 ==============================================================================
 --- head/sys/netinet6/raw_ip6.c	Sat Aug 14 14:09:13 2010	(r211300)
 +++ head/sys/netinet6/raw_ip6.c	Sat Aug 14 14:13:44 2010	(r211301)
 @@ -248,6 +248,7 @@ rip6_input(struct mbuf **mp, int *offp, 
  			}
  			if (blocked != MCAST_PASS) {
  				IP6STAT_INC(ip6s_notmember);
 +				INP_RUNLOCK(in6p);
  				continue;
  			}
  		}
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: open->patched 
State-Changed-By: bz 
State-Changed-When: Mon Aug 16 14:20:16 UTC 2010 
State-Changed-Why:  
Fix comitted to HEAD, MFC upcoming. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=149608 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/149608: commit references a PR
Date: Tue, 17 Aug 2010 07:58:24 +0000 (UTC)

 Author: bz
 Date: Tue Aug 17 07:58:10 2010
 New Revision: 211411
 URL: http://svn.freebsd.org/changeset/base/211411
 
 Log:
   MFC r211301:
   
     In rip6_input(), in case of multicast, we might skip the normal processing
     and go to the next iteration early if multicast filtering would decide that
     this socket shall not receive the data.
     Unlock the pcb in that case or we leak the read lock and next time trying
     to get a write lock, would hang forever.
   
   PR:		kern/149608
   Submitted by:	Chris Luke (chrisy flirble.org)
 
 Modified:
   stable/8/sys/netinet6/raw_ip6.c
 Directory Properties:
   stable/8/sys/   (props changed)
   stable/8/sys/amd64/include/xen/   (props changed)
   stable/8/sys/cam/   (props changed)
   stable/8/sys/cddl/contrib/opensolaris/   (props changed)
   stable/8/sys/contrib/dev/acpica/   (props changed)
   stable/8/sys/contrib/pf/   (props changed)
   stable/8/sys/dev/xen/xenpci/   (props changed)
 
 Modified: stable/8/sys/netinet6/raw_ip6.c
 ==============================================================================
 --- stable/8/sys/netinet6/raw_ip6.c	Tue Aug 17 06:08:09 2010	(r211410)
 +++ stable/8/sys/netinet6/raw_ip6.c	Tue Aug 17 07:58:10 2010	(r211411)
 @@ -248,6 +248,7 @@ rip6_input(struct mbuf **mp, int *offp, 
  			}
  			if (blocked != MCAST_PASS) {
  				IP6STAT_INC(ip6s_notmember);
 +				INP_RUNLOCK(in6p);
  				continue;
  			}
  		}
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: patched->closed 
State-Changed-By: bz 
State-Changed-When: Tue Aug 17 08:02:09 UTC 2010 
State-Changed-Why:  
The patch was comitted to HEAD and stable/8.  Thanks a lot for 
debugging and sending it in! 

http://www.freebsd.org/cgi/query-pr.cgi?pr=149608 
>Unformatted:
