From hselasky@c2i.net  Tue Jan 18 20:54:13 2005
Return-Path: <hselasky@c2i.net>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 088E616A4CE; Tue, 18 Jan 2005 20:54:13 +0000 (GMT)
Received: from mailfe06.swip.net (mailfe06.swip.net [212.247.154.161])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id E44C243D41; Tue, 18 Jan 2005 20:54:11 +0000 (GMT)
	(envelope-from hselasky@c2i.net)
Received: from mp-217-205-199.daxnet.no ([193.217.205.199] verified)
  by mailfe06.swip.net (CommuniGate Pro SMTP 4.2.7)
  with ESMTP id 269254563; Tue, 18 Jan 2005 21:54:10 +0100
Message-Id: <200501182154.39459.hselasky@c2i.net>
Date: Tue, 18 Jan 2005 21:54:38 +0100
From: Hans Petter Selasky <hselasky@c2i.net>
Reply-To: hselasky@c2i.net
To: FreeBSD-gnats-submit@freebsd.org
Cc: kan@freebsd.org, csjp@freebsd.org
Subject: recursive locking in the network stack

>Number:         76432
>Category:       kern
>Synopsis:       [net] [patch] recursive locking in the network stack
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    gnn
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Jan 18 21:00:46 GMT 2005
>Closed-Date:    Mon Jul 10 13:18:04 GMT 2006
>Last-Modified:  Mon Jul 10 13:18:04 GMT 2006
>Originator:     hselasky@c2i.net
>Release:        FreeBSD 5.3-RC1 i386
>Organization:
>Environment:
System: FreeBSD 5.3-RC1 FreeBSD 5.3-RC1 #182: Fri Jan 14 13:45:31 CET 2005 
root@ :/usr/obj/usr/src/sys/custom i386

>Description:
 1) lock with name "rtentry" can recurse at line 197 in the 
file /usr/src/sys/net/route.c, which causes a panic

Backtrace:

panic()
_mtx_lock_sleep()
_mtx_lock_flags()
rtalloc1()
ifa_ifwithroute()
rt_getifa()
route_output()
raw_usend()
rts_send()
sosend()
soo_write()
dofilewrite()
write()
syscall()

 2) Adding flag MTX_RECURSE to mtx_init(), in the file "src/sys/net/route.h" 
leads to another bug:

lock order reversal:

1st rtentry @ /usr/src/sys/net/rtsock.c:429
2nd radix node head @ /usr/src/sys/net/route.c:148

>How-To-Repeat:
 run "ppp" after "dhclient"

>Fix:
 1) run "route delete 0.0.0.0" before running ppp

 2) patch for route.h
*** /usr/src/sys/net/route.h.ref        Tue Jan 18 21:16:05 2005
--- /usr/src/sys/net/route.h    Tue Jan 18 21:17:32 2005
***************
*** 280,286 ****
  #ifdef _KERNEL
  
  #define       RT_LOCK_INIT(_rt) \
!       mtx_init(&(_rt)->rt_mtx, "rtentry", NULL, MTX_DEF | MTX_DUPOK)
  #define       RT_LOCK(_rt)            mtx_lock(&(_rt)->rt_mtx)
  #define       RT_UNLOCK(_rt)          mtx_unlock(&(_rt)->rt_mtx)
  #define       RT_LOCK_DESTROY(_rt)    mtx_destroy(&(_rt)->rt_mtx)
--- 280,286 ----
  #ifdef _KERNEL
  
  #define       RT_LOCK_INIT(_rt) \
!       mtx_init(&(_rt)->rt_mtx, "rtentry", NULL, MTX_DEF | MTX_DUPOK | 
MTX_RECURSE)
  #define       RT_LOCK(_rt)            mtx_lock(&(_rt)->rt_mtx)
  #define       RT_UNLOCK(_rt)          mtx_unlock(&(_rt)->rt_mtx)
  #define       RT_LOCK_DESTROY(_rt)    mtx_destroy(&(_rt)->rt_mtx)
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-net 
Responsible-Changed-By: arved 
Responsible-Changed-When: Tue May 10 13:24:52 GMT 2005 
Responsible-Changed-Why:  
over to freebsd-net Mailinglist for review 

http://www.freebsd.org/cgi/query-pr.cgi?pr=76432 
Responsible-Changed-From-To: freebsd-net->gnn@freebsd.org 
Responsible-Changed-By: gnn 
Responsible-Changed-When: Wed May 11 14:42:34 GMT 2005 
Responsible-Changed-Why:  
Taking this to try to fix it. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=76432 
Responsible-Changed-From-To: gnn@freebsd.org->gnn 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sat Aug 6 18:29:39 GMT 2005 
Responsible-Changed-Why:  
Canonicalize assignment. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=76432 
State-Changed-From-To: open->feedback 
State-Changed-By: gnn 
State-Changed-When: Mon Jun 26 12:54:40 UTC 2006 
State-Changed-Why:  
I could not reproduce the original bug on HEAD and I do not think 
that changing the routing table lock to recursive is the right fix. 

Please let me know if this is still a problem in 6 and HEAD. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=76432 
State-Changed-From-To: feedback->closed 
State-Changed-By: gnn 
State-Changed-When: Mon Jul 10 13:17:38 UTC 2006 
State-Changed-Why:  
Now in HEAD and STABLE.  Received positive feedback. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=76432 
>Unformatted:
