From nemesis!uhclem@fw.ast.com  Sun Feb 25 18:40:54 1996
Received: from fw.ast.com (fw.ast.com [165.164.6.25])
          by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id SAA13538
          for <FreeBSD-gnats-submit@freebsd.org>; Sun, 25 Feb 1996 18:40:54 -0800 (PST)
Received: from nemesis by fw.ast.com with uucp
	(Smail3.1.29.1 #2) id m0tqsoT-00084iC; Sun, 25 Feb 96 20:37 CST
Received: by nemesis.lonestar.org (Smail3.1.27.1 #20)
	id m0tqslH-000CKVC; Sun, 25 Feb 96 20:34 WET
Message-Id: <m0tqslH-000CKVC@nemesis.lonestar.org>
Date: Sun, 25 Feb 96 20:34 WET
From: uhclem@nemesis.lonestar.org
Reply-To: uhclem
To: FreeBSD-gnats-submit@freebsd.org
Subject: Warning from sio driver reports wrong device	FDIV045
X-Send-Pr-Version: 3.2

>Number:         1042
>Category:       i386
>Synopsis:       Warning from sio driver reports wrong device	FDIV045
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    bde
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Feb 25 18:50:01 PST 1996
>Closed-Date:    Fri Jul 3 02:18:40 PDT 1998
>Last-Modified:  Fri Jul  3 02:19:45 PDT 1998
>Originator:     Frank Durda IV
>Release:        FreeBSD 2.1-STABLE i386
>Organization:
None
>Environment:

FreeBSD 2.1 system running a 486DX-33MHz 128K L2 Cache, 12Meg RAM,
four 16550A serial ports (one NS16552 (dual port 16550A), two Startech
16550A), 1540B SCSI controller, WD8013 Ethernet (inactive).
Ports sio0, sio2, sio3 connected to modems, sio1 not connected to anything
for this test.  Two of the ports connected to Telebit WorldBlazers
configured at fixed DTE 57600.  Also a Cardinal V.34 modem at DTE 57600
All modems have hardware flow control enabled.

>Description:

When the DTE speed of the WorldBlazers is increased from 38400 to
57600, the above system experiences "tty-level buffer overflows".

As a symptom of the problem, UUCP sessions end up receiving corrupted files
(this should not happen but it does), and the kernel reports messages
like:

Feb 25 19:48:00 nemesis /kernel: sio1: 247 more tty-level buffer overflows (total 3100)

Note that the system reports the problem on sio1, when there is nothing
connected to that port.  That actual overrun probably occurred on sio0
or sio3.

Another interesting thing is that the Cardinal modem is V.34 and receives
compressed news at rates up to 3100CPS, but never appears to cause
these overruns.  The Telebits (Turbo PEP or PEP) only manage between
1600 and 2100 CPS and they do experience these overruns when the DTE
is set to 57600.  There are no overruns when the Worldblazers are fixed
at 38400.

Hardware flow control is set on all devices and uucico is patched
to force RTSCTS flow control on incoming and outgoing UUCP sessions,
and this can be verified by stty -a < /dev/tty[Dd]3.

Modifying the sio.c driver to trigger at 8 instead of 14 reduces
but does not eliminate the above error messages.  Only reducing the
DTE on the WorldBlazers back to 38400 eliminates the problem.
I have also swapped ports in case the NS16552 and Startech parts were
performing differently.   The problem follows the ports used by the
WorldBlazers.

So the problems appear to be:
1.	Faulty reporting of the guilty device in the kernel warning message.
	It seems to always blame sio1 regardless of what lines are active.

2.	There doesn't appear to be any documentation on what the kernel
	error message is trying to report.

	Reducing the FIFO interrupt trigger did not help, implying a
	different type of overrun in the kernel instead of a hardware FIFO
	overrun.  Because PEP tends to return data in bursts of 64 bytes,
	perhaps some software-based buffer is being overrun.

	Since there appears to be code in sio.c that would detect overruns
	in the hardware FIFO, report this  and lower the trigger value
	automatically, either this code isn't working or this isn't the
	type of overrun the kernel is trying to report.  Again, no
	documentation.
	
3.	When the kernel message is displayed, it usually is displayed three
	times in a row, all with the same timestamp.  It only appears once
	in /var/log/messages.

>How-To-Repeat:

Here, simply establish a protocol g or i UUCP session using Telebit
WorldBlazers and receive data from a remote system with the DTE fixed
at 57600.  If the connection is at 22000bps or faster, failure is likely.
Failures only appear during PEP/Turbo PEP sessions.

>Fix:

Workarounds:
By reducing the hardware interrupt trigger to 8 (from 14), the error
count was reduced, but not eliminated.   The only sure-fire workaround
is to lower the DTE speeds to 38400.

*END*

>Release-Note:
>Audit-Trail:

From: Bruce Evans <bde@zeta.org.au>
To: FreeBSD-gnats-submit@FreeBSD.ORG, uhclem@freefall.freebsd.org
Cc:  Subject: Re: i386/1042: Warning from sio driver reports wrong device	FDIV045
Date: Mon, 26 Feb 1996 18:33:51 +1100

 >...
 >Ports sio0, sio2, sio3 connected to modems, sio1 not connected to anything
                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 >...
 
 >Feb 25 19:48:00 nemesis /kernel: sio1: 247 more tty-level buffer overflows (total 3100)
 
 >Note that the system reports the problem on sio1, when there is nothing
 >connected to that port.  That actual overrun probably occurred on sio0
 >or sio3.
 
 This may be caused by sio1 picking up radiation from the other ports.
 It shouldn't occur if sio1 isn't open, however (then the UART may be
 kept busy by the radiation but the driver ignores it).  The radiation
 problem can usually be fixed by connecting the port to something (even
 something inactive).
 
 The verbose error reporting can take long enough to interfere with the
 reception of futher data :-(.  Errors were once reported every clock
 tick (the rc driver still does this) and slow machines take more than
 one clock tick to report an error so the first error triggered an
 endless cascade of errors.
 
 >Another interesting thing is that the Cardinal modem is V.34 and receives
 >compressed news at rates up to 3100CPS, but never appears to cause
 >these overruns.  The Telebits (Turbo PEP or PEP) only manage between
 >1600 and 2100 CPS and they do experience these overruns when the DTE
 >is set to 57600.  There are no overruns when the Worldblazers are fixed
 >at 38400.
 
 Do the Telebits honour flow control?
 
 >So the problems appear to be:
 >1.	Faulty reporting of the guilty device in the kernel warning message.
 >	It seems to always blame sio1 regardless of what lines are active.
 
 Probably not.
 
 >2.	There doesn't appear to be any documentation on what the kernel
 >	error message is trying to report.
 
 See the sio man page.
 
 >	Reducing the FIFO interrupt trigger did not help, implying a
 >	different type of overrun in the kernel instead of a hardware FIFO
 >	overrun.  Because PEP tends to return data in bursts of 64 bytes,
 >	perhaps some software-based buffer is being overrun.
 
 The raw queue has a size of only 1024 at all baud rates so it is quite
 easy to overrun at high baud rates.  At 115200 bps, 1024 bytes may arrive
 in less than one process scheduling quantum (100 msec) so there the buffer
 is too small if there are 2 hog processes.  Flow control had better work.
 
 >	Since there appears to be code in sio.c that would detect overruns
 >	in the hardware FIFO, report this  and lower the trigger value
 >	automatically, either this code isn't working or this isn't the
 >	type of overrun the kernel is trying to report.  Again, no
 >	documentation.
 
 That code has almost always been disabled and doesn't exist in -current.
 It tended to drop the trigger level to 1 for transient errors.
 
 >3.	When the kernel message is displayed, it usually is displayed three
 >	times in a row, all with the same timestamp.  It only appears once
 >	in /var/log/messages.
 
 Messages are normally repeated for each root login.
 
 Bruce
Responsible-Changed-From-To: freebsd-bugs->bde 
Responsible-Changed-By: scrappy 
Responsible-Changed-When: Wed Apr 10 11:33:54 PDT 1996 
Responsible-Changed-Why:  
another one that falls under Bruce's domain 
State-Changed-From-To: open->closed 
State-Changed-By: phk 
State-Changed-When: Fri Jul 3 02:18:40 PDT 1998 
State-Changed-Why:  
As part of our PR audition campaign, this PR has been closed.  The subject 
seems to be in the category of pilot error or misunderstanding or 
alternatively of insufficient significance to draw any developer attention. 

We apologize for late response to this PR. 
>Unformatted:
