From nobody@FreeBSD.org  Tue Mar 15 21:06:30 2011
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1E1AE106566B
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 15 Mar 2011 21:06:30 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id 03B018FC1E
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 15 Mar 2011 21:06:30 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id p2FL6TQD010454
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 15 Mar 2011 21:06:29 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id p2FL6TPE010452;
	Tue, 15 Mar 2011 21:06:29 GMT
	(envelope-from nobody)
Message-Id: <201103152106.p2FL6TPE010452@red.freebsd.org>
Date: Tue, 15 Mar 2011 21:06:29 GMT
From: Andrey Smagin <samspeed@mail.ru>
To: freebsd-gnats-submit@FreeBSD.org
Cc: Ivan Goryushkin <animage.nvkz@gmail.com>
Subject: tcp_output tcp_mtudisc loop until kernel panic 
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         155585
>Category:       kern
>Synopsis:       [tcp] [panic] tcp_output tcp_mtudisc loop until kernel panic
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    glebius
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Mar 15 21:10:10 UTC 2011
>Closed-Date:    Mon Sep 10 11:56:18 UTC 2012
>Last-Modified:  Mon Sep 10 11:56:18 UTC 2012
>Originator:     Andrey Smagin
>Release:        FreeBSD  8.x, 9-current
>Organization:
DiP Interactive
>Environment:
FreeBSD ns.vvt 9.0-CURRENT FreeBSD 9.0-CURRENT #15: Mon Feb 21 10:00:16 MSK 2011     root@ns.vvt:/usr/obj/usr/src/sys/SAM  amd64
>Description:
My box is connected to 8 different ISP 
I use IPFW for split trafic between ISP by ports and IP addreses.
ruleset is 

10000    rules for outgoing connections direct from this host via any iface
*10001    fwd ISP1_GATE ip from ISP1_IP to not 172.17.0.0/16
*10016    fwd ISP2_GATE ip from ISP2_IP to not 172.17.0.0/16
*10021    fwd ISP3_GATE ip from ISP3_IP to not 172.17.0.0/16
*10026    fwd ISP4_GATE ip from ISP4_IP to not 172.17.0.0/16
*10031    fwd ISP5_GATE ip from ISP5_IP to not 172.17.0.0/16
*10036    fwd ISP6_GATE ip from ISP6_IP to not 172.17.0.0/16

10100    rules for incoming packets from ISP to NAT in_port
*10101    divert 8682 ip from not 172.17.0.0/16 to ISP1_IP
*10116    divert 8686 ip from not 172.17.0.0/16 to ISP2_IP
*10121    divert 8688 ip from not 172.17.0.0/16 to ISP3_IP
*10126    divert 8690 ip from not 172.17.0.0/16 to ISP4_IP
*10131    divert 8692 ip from not 172.17.0.0/16 to ISP5_IP
*10136    divert 8694 ip from not 172.17.0.0/16 to ISP6_IP

10200    if packet after NAT for this host allow it
*10201    allow ip from not 172.17.0.0/16 to ISP1_IP
*10216    allow ip from not 172.17.0.0/16 to ISP2_IP
*10221    allow ip from not 172.17.0.0/16 to ISP3_IP
*10226    allow ip from not 172.17.0.0/16 to ISP4_IP
*10231    allow ip from not 172.17.0.0/16 to ISP5_IP
*10236    allow ip from not 172.17.0.0/16 to ISP6_IP

10500...45000   is rules for move outgoing trafic to ISP from local network hosts
default gateway for FIB0 if my_local_net_IP then use NAT
10500 skipto 50010 ip from 172.17.1.myip to not 172.17.0.0/16 
move http via ISP1
10501 skipto 50000 ip from 172.17.1.12 to not 172.17.0.0/16 80
move all another via ISP2 
10502 skipto 50005 ip from 172.17.1.12 to not 172.17.0.0/16
.. and so on

at 50000.. rules like virtual ISP_No 
this rules dynamicaly change by scripts if any numbers of ISP will
disconnected or his uplink will down
50000    skipto 50200 ip from any to any
50005    skipto 50225 ip from any to any
50010    skipto 50200 ip from any to any
50015    skipto 50215 ip from any to any
50020    skipto 50220 ip from any to any
50025    skipto 50225 ip from any to any
50030    skipto 50230 ip from any to any
50035    skipto 50235 ip from any to any
50040    skipto 50225 ip from any to any
50199    skipto 50500 ip from any to any


50200 this rules for real connected ISP with NAT out_port for local net IP
*50201    131542     12711357 divert 8683 ip from any to any
*50202     93400      6215615 fwd ISP1_GATE ip from any to any
*50203         0            0 skipto 50500 ip from any to any
*50209         0            0 skipto 50500 ip from any to any
*50214         0            0 skipto 50500 ip from any to any
*50214         0            0 skipto 50500 ip from any to any
*50216     51907      5752794 divert 8687 ip from any to any
*50217     51907      5752794 fwd ISP2_GATE ip from any to any
*50218         0            0 skipto 50500 ip from any to any
*50219         0            0 skipto 50500 ip from any to any
*50221  13372501   1432345573 divert 8689 ip from any to any
*50222  13372330   1432341986 fwd ISP3_GATE ip from any to any
*50223         0            0 skipto 50500 ip from any to any
*50224         0            0 skipto 50500 ip from any to any
*50226   2081341    297746506 divert 8691 ip from any to any
*50227   2081336    297746190 fwd ISP4_GATE ip from any to any
*50228         0            0 skipto 50500 ip from any to any
*50229         0            0 skipto 50500 ip from any to any
*50231         0            0 divert 8693 ip from any to any
*50232         0            0 fwd ISP5_GATE ip from any to any
*50233         0            0 skipto 50500 ip from any to any
*50234         0            0 skipto 50500 ip from any to any
*50236    502925     35831696 divert 8695 ip from any to any
*50237    502924     35831612 fwd ISP6_GATE ip from any to any
*50238         0            0 skipto 50500 ip from any to any

50500 deny ip from any to any

also in system 9 FIB's 1-8 - ISP connection default gateway
FIB0 have default gateway local_net_this_host_ip for using NAT 
for self connection, rule 10500


rules marked * changeb by iface_up iface_down scrips in MPD 5.5 

if all IPS work without disconnection - system is stable.
under load if some ISP disconnected and connected again - system kernel panic:

Fatal double fault:
  
ipfw_chk  
ipfw_check_ 
tcp_output 
tcp_mtudisc 
tcp_output 
tcp_mtudisc 
tcp_output 
tcp_mtudisc 
tcp_output 
tcp_mtudisc 
tcp_output 
tcp_mtudisc 
tcp_output 
.. many times
tcp_mtudisc 
tcp_output 

this different source code call first tcp_output
ithread, netgraph, etc... 


>How-To-Repeat:
under heavy load with often ISP disconnection uptime 5-15 minutes
>Fix:
use 5 ISP uptime increased to 1-2 days
use 2 ISP uptime increased to 3-7 days

>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-net 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Wed Mar 16 06:05:51 UTC 2011 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=155585 

From: Andrey Smagin <sam@dipinteractive.com>
To: bug-followup@FreeBSD.org, samspeed@mail.ru
Cc:  
Subject: Re: kern/155585: [tcp] [panic] tcp_output tcp_mtudisc loop until
 kernel panic
Date: Fri, 18 Mar 2011 12:08:24 +0300

   this patch solve my problem
 Index: sys/netinet/tcp_var.h
 ===================================================================
 --- sys/netinet/tcp_var.h       (revision 219727)
 +++ sys/netinet/tcp_var.h       (working copy)
 @@ -239,6 +239,7 @@
   #define        TF_ECN_SND_ECE  0x10000000      /* ECN ECE in queue */
   #define        TF_CONGRECOVERY 0x20000000      /* congestion recovery 
 mode */
   #define        TF_WASCRECOVERY 0x40000000      /* was in congestion 
 recovery */
 +#define        TF_WASMTUDISC   0x80000000      /* was in mtudiscovery */
 
   #define        IN_FASTRECOVERY(t_flags)        (t_flags & TF_FASTRECOVERY)
   #define        ENTER_FASTRECOVERY(t_flags)     t_flags |= TF_FASTRECOVERY
 Index: sys/netinet/tcp_output.c
 ===================================================================
 --- sys/netinet/tcp_output.c    (revision 219727)
 +++ sys/netinet/tcp_output.c    (working copy)
 @@ -1288,8 +1288,15 @@
                           */
                          if (tso)
                                  tp->t_flags &= ~TF_TSO;
 +
 +                       if ( tp->t_flags & TF_WASMTUDISC ) // if 
 EMSGSIZE after call tcp_mtudisc then return EHOSTUNREACH
 +                               return (EHOSTUNREACH);
 +
 +                       tp->t_flags |= TF_WASMTUDISC
                          tcp_mtudisc(tp->t_inpcb, 0);
 +                       tp->t_flags &= ~TF_WASMTUDISC;
                          return (0);
 +
                  case EHOSTDOWN:
                  case EHOSTUNREACH:
                  case ENETDOWN:
 

From: Andrey Zonov <andrey@zonov.org>
To: bug-followup@FreeBSD.org, samspeed@mail.ru
Cc:  
Subject: Re: kern/155585: [tcp] [panic] tcp_output tcp_mtudisc loop until
 kernel panic
Date: Wed, 02 Nov 2011 20:54:15 +0400

 This is a multi-part message in MIME format.
 --------------040501090905060606010708
 Content-Type: text/plain; charset=UTF-8; format=flowed
 Content-Transfer-Encoding: 7bit
 
 Hi Andrey,
 
 Please try attached patch.
 
 I think the same problem was resolved here [1].
 
 [1] http://svnweb.freebsd.org/base?view=revision&revision=178029
 
 -- 
 Andrey Zonov
 
 
 --------------040501090905060606010708
 Content-Type: text/plain;
  name="patch-tcp_output.c.txt"
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment;
  filename="patch-tcp_output.c.txt"
 
 SW5kZXg6IHN5cy9uZXRpbmV0L3RjcF9vdXRwdXQuYwo9PT09PT09PT09PT09PT09PT09PT09
 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBzeXMv
 bmV0aW5ldC90Y3Bfb3V0cHV0LmMJKHJldmlzaW9uIDIyNzAyMCkKKysrIHN5cy9uZXRpbmV0
 L3RjcF9vdXRwdXQuYwkod29ya2luZyBjb3B5KQpAQCAtMTc3LDggKzE3Nyw5IEBACiAJaW50
 IGlkbGUsIHNlbmRhbG90OwogCWludCBzYWNrX3J4bWl0LCBzYWNrX2J5dGVzX3J4bXQ7CiAJ
 c3RydWN0IHNhY2tob2xlICpwOwotCWludCB0c287CisJaW50IHRzbywgbXR1LCBvZmZlcjsK
 IAlzdHJ1Y3QgdGNwb3B0IHRvOworCXN0cnVjdCByb3V0ZSBybzsKICNpZiAwCiAJaW50IG1h
 eGJ1cnN0ID0gVENQX01BWEJVUlNUOwogI2VuZGlmCkBAIC0yMTgsNiArMjE5LDcgQEAKIAkJ
 dGNwX3NhY2tfYWRqdXN0KHRwKTsKIAlzZW5kYWxvdCA9IDA7CiAJdHNvID0gMDsKKwltdHUg
 PSAwOwogCW9mZiA9IHRwLT5zbmRfbnh0IC0gdHAtPnNuZF91bmE7CiAJc2VuZHdpbiA9IG1p
 bih0cC0+c25kX3duZCwgdHAtPnNuZF9jd25kKTsKIApAQCAtMTIzMSw5ICsxMjMzLDE2IEBA
 CiAJaWYgKFZfcGF0aF9tdHVfZGlzY292ZXJ5ICYmIHRwLT50X21heG9wZCA+IFZfdGNwX21p
 bm1zcykKIAkJaXAtPmlwX29mZiB8PSBJUF9ERjsKIAotCWVycm9yID0gaXBfb3V0cHV0KG0s
 IHRwLT50X2lucGNiLT5pbnBfb3B0aW9ucywgTlVMTCwKKwliemVybygmcm8sIHNpemVvZihy
 bykpOworCisJZXJyb3IgPSBpcF9vdXRwdXQobSwgdHAtPnRfaW5wY2ItPmlucF9vcHRpb25z
 LCAmcm8sCiAJICAgICgoc28tPnNvX29wdGlvbnMgJiBTT19ET05UUk9VVEUpID8gSVBfUk9V
 VEVUT0lGIDogMCksIDAsCiAJICAgIHRwLT50X2lucGNiKTsKKworCWlmIChlcnJvciA9PSBF
 TVNHU0laRSAmJiByby5yb19ydCkKKwkJbXR1ID0gcm8ucm9fcnQtPnJ0X3JteC5ybXhfbXR1
 OworCWlmIChyby5yb19ydCkKKwkJUlRGUkVFKHJvLnJvX3J0KTsKICAgICB9CiAjZW5kaWYg
 LyogSU5FVCAqLwogCWlmIChlcnJvcikgewpAQCAtMTI3OSwyMiArMTI4OCwyMCBAQAogCQkJ
 LyoKIAkJCSAqIEZvciBzb21lIHJlYXNvbiB0aGUgaW50ZXJmYWNlIHdlIHVzZWQgaW5pdGlh
 bGx5CiAJCQkgKiB0byBzZW5kIHNlZ21lbnRzIGNoYW5nZWQgdG8gYW5vdGhlciBvciBsb3dl
 cmVkCi0JCQkgKiBpdHMgTVRVLgotCQkJICoKLQkJCSAqIHRjcF9tdHVkaXNjKCkgd2lsbCBm
 aW5kIG91dCB0aGUgbmV3IE1UVSBhbmQgYXMKLQkJCSAqIGl0cyBsYXN0IGFjdGlvbiwgaW5p
 dGlhdGUgcmV0cmFuc21pc3Npb24sIHNvIGl0Ci0JCQkgKiBpcyBpbXBvcnRhbnQgdG8gbm90
 IGRvIHNvIGhlcmUuCi0JCQkgKgotCQkJICogSWYgVFNPIHdhcyBhY3RpdmUgd2UgZWl0aGVy
 IGdvdCBhbiBpbnRlcmZhY2UKLQkJCSAqIHdpdGhvdXQgVFNPIGNhcGFiaWxpdHMgb3IgVFNP
 IHdhcyB0dXJuZWQgb2ZmLgotCQkJICogRGlzYWJsZSBpdCBmb3IgdGhpcyBjb25uZWN0aW9u
 IGFzIHRvbyBhbmQKLQkJCSAqIGltbWVkaWF0bHkgcmV0cnkgd2l0aCBNU1Mgc2l6ZWQgc2Vn
 bWVudHMgZ2VuZXJhdGVkCi0JCQkgKiBieSB0aGlzIGZ1bmN0aW9uLgorCQkJICogaXRzIE1U
 VS4gVXBkYXRlIHRfbWF4b3BkIGFuZCB0X21heHNlZyB0aHJvdWdoCisJCQkgKiB0Y3BfbXNz
 X3VwZGF0ZSgpIGFuZCB0cnkgdG8gc2VuZCBkYXRhIGFnYWluLgogCQkJICovCi0JCQlpZiAo
 dHNvKQotCQkJCXRwLT50X2ZsYWdzICY9IH5URl9UU087Ci0JCQl0Y3BfbXR1ZGlzYyh0cC0+
 dF9pbnBjYiwgMCk7Ci0JCQlyZXR1cm4gKDApOworCQkJaWYgKG10dSAhPSAwKSB7CisJCQkJ
 b2ZmZXIgPSBtdHUgLSBoZHJsZW47CisJCQkJaWYgKCh0cC0+dF9mbGFncyAmIFRGX1JDVkRf
 VFNUTVApID09IFRGX1JDVkRfVFNUTVApCisJCQkJCW9mZmVyICs9IFRDUE9MRU5fVFNUQU1Q
 X0FQUEE7CisJCQkJdGNwX21zc191cGRhdGUodHAsIG9mZmVyLCBOVUxMLCBOVUxMKTsKKwkJ
 CQlnb3RvIGFnYWluOworCQkJfQorCQkJLyoKKwkJCSAqIFRoaXMgaXMgdGhlIGJlc3Qgd2Ug
 Y2FuIGRvIGhlcmUuCisJCQkgKi8KKwkJCXJldHVybiAoZXJyb3IpOwogCQljYXNlIEVIT1NU
 RE9XTjoKIAkJY2FzZSBFSE9TVFVOUkVBQ0g6CiAJCWNhc2UgRU5FVERPV046Cg==
 --------------040501090905060606010708--
State-Changed-From-To: open->feedback 
State-Changed-By: melifaro 
State-Changed-When: Thu Nov 3 09:52:26 UTC 2011 
State-Changed-Why:  
Take 


Responsible-Changed-From-To: freebsd-net->melifaro 
Responsible-Changed-By: melifaro 
Responsible-Changed-When: Thu Nov 3 09:52:26 UTC 2011 
Responsible-Changed-Why:  
Take 

http://www.freebsd.org/cgi/query-pr.cgi?pr=155585 

From: Gleb Smirnoff <glebius@freebsd.org>
To: Andrey Smagin <samspeed@mail.ru>, Andrey Zonov <andrey@zonov.org>,
        Andrey Smagin <sam@dipinteractive.com>
Cc: bug-followup@freebsd.org
Subject: Re: kern/155585: [tcp] [panic] tcp_output tcp_mtudisc loop until
 kernel panic
Date: Wed, 4 Jul 2012 11:56:23 +0400

 --mhjHhnbe5PrRcwjY
 Content-Type: text/plain; charset=koi8-r
 Content-Disposition: inline
 
   Hello,
 
   the patch from Andrey Zonov sounds like a good plan.
 
   It has couple of issues.
 
 1) IMHO, ip6_output() needs same things as ip_output() - supplying
    route and checking it. Correct me, if I'm wrong.
 
 2) When ip_output()/ip6_output() is passed a non-NULL route, the
    FLOWTABLE lookup is always skipped. Disabling flowtable for
    entire TCP output sounds like a bad idea to me.
    Thus, recently I have committed a change to head that removes
    this pessimisation:
 
    http://svnweb.freebsd.org/base?view=revision&revision=238092
 
 And I've edited Andrey's patch to include 1). Can you please look
 at it?
 
 If you run stable/9 on your 8 ISP box, then I can prepare a patchset
 that would include r238092 and related bits and attached patch
 to tcp_output(), so that you could test it.
 
 Testing would be much appreciated.
 
 -- 
 Totus tuus, Glebius.
 
 --mhjHhnbe5PrRcwjY
 Content-Type: text/x-diff; charset=koi8-r
 Content-Disposition: attachment; filename="tcp_output.diff"
 
 Index: tcp_output.c
 ===================================================================
 --- tcp_output.c	(revision 238093)
 +++ tcp_output.c	(working copy)
 @@ -180,7 +180,7 @@
  	int idle, sendalot;
  	int sack_rxmit, sack_bytes_rxmt;
  	struct sackhole *p;
 -	int tso;
 +	int tso, mtu;
  	struct tcpopt to;
  #if 0
  	int maxburst = TCP_MAXBURST;
 @@ -226,6 +226,7 @@
  		tcp_sack_adjust(tp);
  	sendalot = 0;
  	tso = 0;
 +	mtu = 0;
  	off = tp->snd_nxt - tp->snd_una;
  	sendwin = min(tp->snd_wnd, tp->snd_cwnd);
  
 @@ -1209,6 +1210,9 @@
  	 */
  #ifdef INET6
  	if (isipv6) {
 +		struct route_in6 ro;
 +
 +		bzero(&ro, sizeof(ro));
  		/*
  		 * we separately set hoplimit for every segment, since the
  		 * user might want to change the value via setsockopt.
 @@ -1218,10 +1222,13 @@
  		ip6->ip6_hlim = in6_selecthlim(tp->t_inpcb, NULL);
  
  		/* TODO: IPv6 IP6TOS_ECT bit on */
 -		error = ip6_output(m,
 -			    tp->t_inpcb->in6p_outputopts, NULL,
 -			    ((so->so_options & SO_DONTROUTE) ?
 -			    IP_ROUTETOIF : 0), NULL, NULL, tp->t_inpcb);
 +		error = ip6_output(m, tp->t_inpcb->in6p_outputopts, &ro,
 +		    ((so->so_options & SO_DONTROUTE) ?  IP_ROUTETOIF : 0),
 +		    NULL, NULL, tp->t_inpcb);
 +
 +		if (error == EMSGSIZE && ro.ro_rt != NULL)
 +			mtu = ro.ro_rt->rt_rmx.rmx_mtu;
 +		RO_RTFREE(&ro);
  	}
  #endif /* INET6 */
  #if defined(INET) && defined(INET6)
 @@ -1229,6 +1236,9 @@
  #endif
  #ifdef INET
      {
 +	struct route ro;
 +
 +	bzero(&ro, sizeof(ro));
  	ip->ip_len = m->m_pkthdr.len;
  #ifdef INET6
  	if (tp->t_inpcb->inp_vflag & INP_IPV6PROTO)
 @@ -1245,9 +1255,13 @@
  	if (V_path_mtu_discovery && tp->t_maxopd > V_tcp_minmss)
  		ip->ip_off |= IP_DF;
  
 -	error = ip_output(m, tp->t_inpcb->inp_options, NULL,
 +	error = ip_output(m, tp->t_inpcb->inp_options, &ro,
  	    ((so->so_options & SO_DONTROUTE) ? IP_ROUTETOIF : 0), 0,
  	    tp->t_inpcb);
 +
 +	if (error == EMSGSIZE && ro.ro_rt != NULL)
 +		mtu = ro.ro_rt->rt_rmx.rmx_mtu;
 +	RO_RTFREE(&ro);
      }
  #endif /* INET */
  	if (error) {
 @@ -1294,21 +1308,18 @@
  			 * For some reason the interface we used initially
  			 * to send segments changed to another or lowered
  			 * its MTU.
 -			 *
 -			 * tcp_mtudisc() will find out the new MTU and as
 -			 * its last action, initiate retransmission, so it
 -			 * is important to not do so here.
 -			 *
  			 * If TSO was active we either got an interface
  			 * without TSO capabilits or TSO was turned off.
 -			 * Disable it for this connection as too and
 -			 * immediatly retry with MSS sized segments generated
 -			 * by this function.
 +			 * If we obtained mtu from ip_output() then update
 +			 * it and try again.
  			 */
  			if (tso)
  				tp->t_flags &= ~TF_TSO;
 -			tcp_mtudisc(tp->t_inpcb, -1);
 -			return (0);
 +			if (mtu != 0) {
 +				tcp_mss_update(tp, -1, mtu, NULL, NULL);
 +				goto again;
 +			}
 +			return (error);
  		case EHOSTDOWN:
  		case EHOSTUNREACH:
  		case ENETDOWN:
 
 --mhjHhnbe5PrRcwjY--
Responsible-Changed-From-To: melifaro->glebius 
Responsible-Changed-By: glebius 
Responsible-Changed-When: Wed Jul 11 08:55:10 UTC 2012 
Responsible-Changed-Why:  
I'll handle this PR 

http://www.freebsd.org/cgi/query-pr.cgi?pr=155585 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/155585: commit references a PR
Date: Mon, 16 Jul 2012 07:08:46 +0000 (UTC)

 Author: glebius
 Date: Mon Jul 16 07:08:34 2012
 New Revision: 238516
 URL: http://svn.freebsd.org/changeset/base/238516
 
 Log:
   If ip_output() returns EMSGSIZE to tcp_output(), then the latter calls
   tcp_mtudisc(), which in its turn may call tcp_output(). Under certain
   conditions (must admit they are very special) an infinite recursion can
   happen.
   
   To avoid recursion we can pass struct route to ip_output() and obtain
   correct mtu. This allows us not to use tcp_mtudisc() but call tcp_mss_update()
   directly.
   
   PR:		kern/155585
   Submitted by:	Andrey Zonov <andrey zonov.org> (original version of patch)
 
 Modified:
   head/sys/netinet/tcp_output.c
 
 Modified: head/sys/netinet/tcp_output.c
 ==============================================================================
 --- head/sys/netinet/tcp_output.c	Mon Jul 16 06:56:46 2012	(r238515)
 +++ head/sys/netinet/tcp_output.c	Mon Jul 16 07:08:34 2012	(r238516)
 @@ -180,7 +180,7 @@ tcp_output(struct tcpcb *tp)
  	int idle, sendalot;
  	int sack_rxmit, sack_bytes_rxmt;
  	struct sackhole *p;
 -	int tso;
 +	int tso, mtu;
  	struct tcpopt to;
  #if 0
  	int maxburst = TCP_MAXBURST;
 @@ -226,6 +226,7 @@ again:
  		tcp_sack_adjust(tp);
  	sendalot = 0;
  	tso = 0;
 +	mtu = 0;
  	off = tp->snd_nxt - tp->snd_una;
  	sendwin = min(tp->snd_wnd, tp->snd_cwnd);
  
 @@ -1209,6 +1210,9 @@ timer:
  	 */
  #ifdef INET6
  	if (isipv6) {
 +		struct route_in6 ro;
 +
 +		bzero(&ro, sizeof(ro));
  		/*
  		 * we separately set hoplimit for every segment, since the
  		 * user might want to change the value via setsockopt.
 @@ -1218,10 +1222,13 @@ timer:
  		ip6->ip6_hlim = in6_selecthlim(tp->t_inpcb, NULL);
  
  		/* TODO: IPv6 IP6TOS_ECT bit on */
 -		error = ip6_output(m,
 -			    tp->t_inpcb->in6p_outputopts, NULL,
 -			    ((so->so_options & SO_DONTROUTE) ?
 -			    IP_ROUTETOIF : 0), NULL, NULL, tp->t_inpcb);
 +		error = ip6_output(m, tp->t_inpcb->in6p_outputopts, &ro,
 +		    ((so->so_options & SO_DONTROUTE) ?  IP_ROUTETOIF : 0),
 +		    NULL, NULL, tp->t_inpcb);
 +
 +		if (error == EMSGSIZE && ro.ro_rt != NULL)
 +			mtu = ro.ro_rt->rt_rmx.rmx_mtu;
 +		RO_RTFREE(&ro);
  	}
  #endif /* INET6 */
  #if defined(INET) && defined(INET6)
 @@ -1229,6 +1236,9 @@ timer:
  #endif
  #ifdef INET
      {
 +	struct route ro;
 +
 +	bzero(&ro, sizeof(ro));
  	ip->ip_len = m->m_pkthdr.len;
  #ifdef INET6
  	if (tp->t_inpcb->inp_vflag & INP_IPV6PROTO)
 @@ -1245,9 +1255,13 @@ timer:
  	if (V_path_mtu_discovery && tp->t_maxopd > V_tcp_minmss)
  		ip->ip_off |= IP_DF;
  
 -	error = ip_output(m, tp->t_inpcb->inp_options, NULL,
 +	error = ip_output(m, tp->t_inpcb->inp_options, &ro,
  	    ((so->so_options & SO_DONTROUTE) ? IP_ROUTETOIF : 0), 0,
  	    tp->t_inpcb);
 +
 +	if (error == EMSGSIZE && ro.ro_rt != NULL)
 +		mtu = ro.ro_rt->rt_rmx.rmx_mtu;
 +	RO_RTFREE(&ro);
      }
  #endif /* INET */
  	if (error) {
 @@ -1294,21 +1308,18 @@ out:
  			 * For some reason the interface we used initially
  			 * to send segments changed to another or lowered
  			 * its MTU.
 -			 *
 -			 * tcp_mtudisc() will find out the new MTU and as
 -			 * its last action, initiate retransmission, so it
 -			 * is important to not do so here.
 -			 *
  			 * If TSO was active we either got an interface
  			 * without TSO capabilits or TSO was turned off.
 -			 * Disable it for this connection as too and
 -			 * immediatly retry with MSS sized segments generated
 -			 * by this function.
 +			 * If we obtained mtu from ip_output() then update
 +			 * it and try again.
  			 */
  			if (tso)
  				tp->t_flags &= ~TF_TSO;
 -			tcp_mtudisc(tp->t_inpcb, -1);
 -			return (0);
 +			if (mtu != 0) {
 +				tcp_mss_update(tp, -1, mtu, NULL, NULL);
 +				goto again;
 +			}
 +			return (error);
  		case EHOSTDOWN:
  		case EHOSTUNREACH:
  		case ENETDOWN:
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: feedback->patched 
State-Changed-By: glebius 
State-Changed-When: Mon Jul 16 07:10:35 UTC 2012 
State-Changed-Why:  
Fixed in head/ 

http://www.freebsd.org/cgi/query-pr.cgi?pr=155585 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/155585: commit references a PR
Date: Mon, 10 Sep 2012 11:43:41 +0000 (UTC)

 Author: glebius
 Date: Mon Sep 10 11:43:28 2012
 New Revision: 240307
 URL: http://svn.freebsd.org/changeset/base/240307
 
 Log:
   Merge r238516 from head:
     If ip_output() returns EMSGSIZE to tcp_output(), then the latter calls
     tcp_mtudisc(), which in its turn may call tcp_output(). Under certain
     conditions (must admit they are very special) an infinite recursion can
     happen.
   
     To avoid recursion we can pass struct route to ip_output() and obtain
     correct mtu. This allows us not to use tcp_mtudisc() but call tcp_mss_update()
     directly.
   
     PR:		kern/155585
     Submitted by:	zont
 
 Modified:
   stable/9/sys/netinet/tcp_output.c
 Directory Properties:
   stable/9/sys/   (props changed)
 
 Modified: stable/9/sys/netinet/tcp_output.c
 ==============================================================================
 --- stable/9/sys/netinet/tcp_output.c	Mon Sep 10 11:38:55 2012	(r240306)
 +++ stable/9/sys/netinet/tcp_output.c	Mon Sep 10 11:43:28 2012	(r240307)
 @@ -182,7 +182,7 @@ tcp_output(struct tcpcb *tp)
  	int idle, sendalot;
  	int sack_rxmit, sack_bytes_rxmt;
  	struct sackhole *p;
 -	int tso;
 +	int tso, mtu;
  	struct tcpopt to;
  #if 0
  	int maxburst = TCP_MAXBURST;
 @@ -223,6 +223,7 @@ again:
  		tcp_sack_adjust(tp);
  	sendalot = 0;
  	tso = 0;
 +	mtu = 0;
  	off = tp->snd_nxt - tp->snd_una;
  	sendwin = min(tp->snd_wnd, tp->snd_cwnd);
  
 @@ -1206,6 +1207,9 @@ timer:
  	 */
  #ifdef INET6
  	if (isipv6) {
 +		struct route_in6 ro;
 +
 +		bzero(&ro, sizeof(ro));
  		/*
  		 * we separately set hoplimit for every segment, since the
  		 * user might want to change the value via setsockopt.
 @@ -1215,10 +1219,13 @@ timer:
  		ip6->ip6_hlim = in6_selecthlim(tp->t_inpcb, NULL);
  
  		/* TODO: IPv6 IP6TOS_ECT bit on */
 -		error = ip6_output(m,
 -			    tp->t_inpcb->in6p_outputopts, NULL,
 -			    ((so->so_options & SO_DONTROUTE) ?
 -			    IP_ROUTETOIF : 0), NULL, NULL, tp->t_inpcb);
 +		error = ip6_output(m, tp->t_inpcb->in6p_outputopts, &ro,
 +		    ((so->so_options & SO_DONTROUTE) ?  IP_ROUTETOIF : 0),
 +		    NULL, NULL, tp->t_inpcb);
 +
 +		if (error == EMSGSIZE && ro.ro_rt != NULL)
 +			mtu = ro.ro_rt->rt_rmx.rmx_mtu;
 +		RO_RTFREE(&ro);
  	}
  #endif /* INET6 */
  #if defined(INET) && defined(INET6)
 @@ -1226,6 +1233,9 @@ timer:
  #endif
  #ifdef INET
      {
 +	struct route ro;
 +
 +	bzero(&ro, sizeof(ro));
  	ip->ip_len = m->m_pkthdr.len;
  #ifdef INET6
  	if (tp->t_inpcb->inp_vflag & INP_IPV6PROTO)
 @@ -1242,9 +1252,13 @@ timer:
  	if (V_path_mtu_discovery && tp->t_maxopd > V_tcp_minmss)
  		ip->ip_off |= IP_DF;
  
 -	error = ip_output(m, tp->t_inpcb->inp_options, NULL,
 +	error = ip_output(m, tp->t_inpcb->inp_options, &ro,
  	    ((so->so_options & SO_DONTROUTE) ? IP_ROUTETOIF : 0), 0,
  	    tp->t_inpcb);
 +
 +	if (error == EMSGSIZE && ro.ro_rt != NULL)
 +		mtu = ro.ro_rt->rt_rmx.rmx_mtu;
 +	RO_RTFREE(&ro);
      }
  #endif /* INET */
  	if (error) {
 @@ -1291,21 +1305,18 @@ out:
  			 * For some reason the interface we used initially
  			 * to send segments changed to another or lowered
  			 * its MTU.
 -			 *
 -			 * tcp_mtudisc() will find out the new MTU and as
 -			 * its last action, initiate retransmission, so it
 -			 * is important to not do so here.
 -			 *
  			 * If TSO was active we either got an interface
  			 * without TSO capabilits or TSO was turned off.
 -			 * Disable it for this connection as too and
 -			 * immediatly retry with MSS sized segments generated
 -			 * by this function.
 +			 * If we obtained mtu from ip_output() then update
 +			 * it and try again.
  			 */
  			if (tso)
  				tp->t_flags &= ~TF_TSO;
 -			tcp_mtudisc(tp->t_inpcb, -1);
 -			return (0);
 +			if (mtu != 0) {
 +				tcp_mss_update(tp, -1, mtu, NULL, NULL);
 +				goto again;
 +			}
 +			return (error);
  		case EHOSTDOWN:
  		case EHOSTUNREACH:
  		case ENETDOWN:
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: patched->closed 
State-Changed-By: glebius 
State-Changed-When: Mon Sep 10 11:55:59 UTC 2012 
State-Changed-Why:  
Merged to stable/9. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=155585 
>Unformatted:
