From jlemon@prism.flugsvamp.com Sun Oct 31 10:58:48 1999
Return-Path: <jlemon@prism.flugsvamp.com>
Received: from prism.flugsvamp.com (prism.flugsvamp.com [208.139.222.230])
	by hub.freebsd.org (Postfix) with ESMTP id 2B0DE14C0E
	for <FreeBSD-gnats-submit@freebsd.org>; Sun, 31 Oct 1999 10:58:46 -0800 (PST)
	(envelope-from jlemon@prism.flugsvamp.com)
Received: (from jlemon@localhost)
	by prism.flugsvamp.com (8.9.3/8.9.3) id MAA11750;
	Sun, 31 Oct 1999 12:58:33 -0600 (CST)
	(envelope-from jlemon)
Message-Id: <199910311858.MAA11750@prism.flugsvamp.com>
Date: Sun, 31 Oct 1999 12:58:33 -0600 (CST)
From: Jonathan Lemon <jlemon@flugsvamp.com>
Sender: jlemon@prism.flugsvamp.com
Reply-To: jlemon@flugsvamp.com
To: FreeBSD-gnats-submit@freebsd.org
Subject: packet loss on gigabit network
X-Send-Pr-Version: 3.2

>Number:         14623
>Category:       kern
>Synopsis:       Some Netgear FS509s have problems
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Oct 31 11:00:01 PST 1999
>Closed-Date:    Mon Nov 1 10:26:59 PST 1999
>Last-Modified:  Mon Nov  1 10:34:01 PST 1999
>Originator:     Jonathan Lemon
>Release:        FreeBSD 4.0-CURRENT i386
>Organization:
>Environment:

PC#1 (client) is a 450Mhz-PIII with a fxp0 card.        (Dell 410)
PC#2 (server) is a 600Mhz-PIII with an Alteon card.	(Dell 4300)

Machines are connected via a Netgear FS509 switch, with the alteon
card being plugged into the gigabit port.  PC#2 (server) also has 
an fxp0 card which can be used in lieu of the ti0 card.

All connections to the switch are full-duplex.

Both machines are running 4.0-CURRENT as of 28-oct-99.

Alteon card/driver details:

$FreeBSD: src/sys/pci/if_ti.c,v 1.24 1999/09/23 03:32:54 wpaul
ti0@pci0:12:0:  class=0x020000 card=0x000112ae chip=0x000112ae rev=0x01 hdr=0x00


>Description:

  Performing bulk data transfer between client and server results in
packets disappearing.  Further retransmits of the lost packet still
do not result in the other end receiving the packet.  More specifically,
tcpdumps taken on both machines show the following pattern:
  (note: no packets were dropped by bpf during these traces)

   Client transmits on the order of 100 packets in at full blast to
server, which acks the packets correctly.

   Server does not receive a packet that the client transmits
(never shows up on tcpdump on the server side)

   Client retransmits missing packet using exponential backoff (1, 2,
4, 8, 16, etc).  tcpdump on the client shows the packet being sent.
tcpdump on the server shows no packets being received.

   No further progress is made on the connection.

This is symmetric; it doesn't matter whether the sender is the server
or the client; it happens both ways.

  1. this only happens after a large number of packets are sent.
     (roughly between 145K to 310K bytes)
  2. this does not happen when using two fxp0 cards.
  3. other TCP sessions between the machines (e.g.: telnet) are unaffected.

tcpdump traces, full dmesg output, or other information also available.

>How-To-Repeat:

	ftp a large file (/kernel) from one machine to another in the
	given environment.

>Fix:


>Release-Note:
>Audit-Trail:

From: "Kenneth D. Merry" <ken@kdm.org>
To: jlemon@flugsvamp.com (Jonathan Lemon)
Cc: bugs@FreeBSD.ORG, FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/14623: packet loss on gigabit network
Date: Mon, 1 Nov 1999 00:44:04 -0700 (MST)

 Jonathan Lemon wrote...
 > Upon further investigation, this appears to be a bit-pattern problem.
 > Attached is a file which refuses to transfer via a "ftp get".  (The
 > smallest part of a /kernel file that I can figure out which will
 > reproduce the problem).
 > 
 > Are there any known issues with either the Netgear switch or Alteon card?
 
 I haven't had any problems with the Alteon boards.  I would suspect the
 switch.  If you want to be sure that the card isn't at fault, get two, put
 them back to back, and ftp the file from one machine to another.
 
 I tried ftping your 'badfile' between two machines with 512K ACEnic boards
 (connected by fiber, no switch in between) and had no trouble:
 
 ftp> bin
 200 Type set to I.
 ftp> get badfile
 local: badfile remote: badfile
 200 PORT command successful.
 150 Opening BINARY mode data connection for 'badfile' (12288 bytes).
 100% |**************************************************| 12288       00:00 ETA
 226 Transfer complete.
 12288 bytes received in 0.00 seconds (12.07 MB/s)
 ftp> 
 
 I also haven't had any trouble with these boards and an Alteon ACEswitch
 180.
 
 So I would suspect the switch.  Taking it out of the loop, if possible,
 might confirm whether or not it is the problem.
 
 Ken
 -- 
 Kenneth Merry
 ken@kdm.org
 
State-Changed-From-To: open->closed 
State-Changed-By: jlemon 
State-Changed-When: Mon Nov 1 10:26:59 PST 1999 
State-Changed-Why:  
Netgear tech support confirmed that some FS509 units have knwon 
problems; one thing mentioned was a batch of faulty SGRAM.  The 
unit in question is being RMA'd to Netgear, I'll reopen this PR  
if the new unit still has problems. 
>Unformatted:
