From dillon@backplane.com  Tue Dec 30 17:45:41 1997
Received: from apollo.backplane.com (apollo.backplane.com [207.33.240.2])
          by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id RAA17539
          for <FreeBSD-gnats-submit@freebsd.org>; Tue, 30 Dec 1997 17:45:39 -0800 (PST)
          (envelope-from dillon@backplane.com)
Received: (dillon@localhost) by apollo.backplane.com (8.8.8/8.6.5) id RAA00389; Tue, 30 Dec 1997 17:45:36 -0800 (PST)
Message-Id: <199712310145.RAA00389@apollo.backplane.com>
Date: Tue, 30 Dec 1997 17:45:36 -0800 (PST)
From: Matthew Dillon <dillon@backplane.com>
Reply-To: dillon@backplane.com
To: FreeBSD-gnats-submit@freebsd.org
Subject: silo overflows running camediaplay at 115200 even w/ rx fifo set medl
X-Send-Pr-Version: 3.2

>Number:         5398
>Category:       i386
>Synopsis:       silo overflows running
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    dillon
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Dec 30 17:50:01 PST 1997
>Closed-Date:    Tue Mar 13 23:26:32 PST 2001
>Last-Modified:  Tue Mar 13 23:29:47 PST 2001
>Originator:     Matthew Dillon
>Release:        FreeBSD 3.0-CURRENT i386
>Organization:
>Environment:

	FreeBSD-current on PPro200 running XFree86 w/ matrox card 
	running 1280x1024x24, 16550A (verified to have working FIFOs)

>Description:

	camediaplay is a program which downloads pictures from a digital
	camera over a 3-wire (no flow control) serial interface.  After
	increasing the tty buffer size to 4096 (TTYHOG=4096), which got
	rid of the high-level buffer overruns, I still had a problem with
	silo overflows.

	I hacked the kernel to reduce RX FIFO interrupt point from HIGH
	to MEDL and verified that the interrupt rate increased accordingly
	from 760 (rx fifo triggers at 14) to around 2700 ints/sec @ 115200 
	(rx fifo triggers at 4).  Despite the additional margin in the RX
	fifo, silo overflows still occured on a non-idle machine.  For example,
	when I move a netscape window around the X display while camera 
	downloading is occuring.  no page faults, plenty of free memory,
	no disk activity.  I am at a loss as to how the silo errors can
	occur under those circumstances.

	Something in the kernel is disabling interrupts for greater then
	12 x 86uS = 1ms, causing the RX Fifo in the 16550A to fill up and
	generate overrun errors.  1ms is a very, very long period of time.
	
						-Matt

>How-To-Repeat:

	Generate a serial stream at 115200 with no flow control.
	Put the serial port into RAW mode 8N1 @ 115200 and run a
	program to drain it.

	Do something on the machine, such as move a large netscape window
	around the X terminal (move it OPAQUE rather then as an outline,
	and move it behind other windows and such).  A silo error should
	occur relatively quickly.

>Fix:
	
	Unknown.  The problem can be mitigated somewhat by setting the
	TTYHOG config variable (TTYHOG=4096), and by changing

	    com->fifo_image = t->c_ospeed <= 4800 ? FIFO_ENABLE : FIFO_ENABLE | FIFO_RX_HIGH;

	to

	    com->fifo_image = t->c_ospeed <= 4800 ? FIFO_ENABLE : FIFO_ENABLE | FIFO_RX_MEDL;

	But silo overrun errors still occur (albeit at a lower frequency).


>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->closed 
State-Changed-By: phk 
State-Changed-When: Thu Apr 30 13:29:59 PDT 1998 
State-Changed-Why:  
I think your machine simply is out of steam.  Blitting 3 mbyte around is 
going to take quite a bit of your cpu and bus bandwidth. 

From: John-Mark Gurney <gurney_j@efn.org>
To: Poul-Henning Kamp <phk@FreeBSD.ORG>
Cc: dillon@backplane.com, freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: i386/5398
Date: Fri, 1 May 1998 18:32:33 -0700

 Poul-Henning Kamp scribbled this message on Apr 30:
 > Synopsis: silo overflows running
 > 
 > State-Changed-From-To: open->closed
 > State-Changed-By: phk
 > State-Changed-When: Thu Apr 30 13:29:59 PDT 1998
 > State-Changed-Why: 
 > I think your machine simply is out of steam.  Blitting 3 mbyte around is
 > going to take quite a bit of your cpu and bus bandwidth.
 
 actually, this is probably due to a device that isn't with-in the spec'd
 rs-232 speeds...  I have the same problem with the ricochet on my
 notebook... I can't run faster than 19.2kbps because the ricochet runs
 about 5% over the 19.2kbps clock (I get about 2050int/sec at 19.2kbps)..
 
 try running the speeds lower...
 
 -- 
   John-Mark Gurney                      Modem Rev/FAX: +1 541 346 9237
   Cu Networking					  P.O. Box 5693, 97405
 
   Live in Peace, destroy Micro$oft, support free software, run FreeBSD
 	    Don't trust anyone you don't have the source for

From: Matthew Dillon <dillon@backplane.com>
To: John-Mark Gurney <gurney_j@efn.org>
Cc: Poul-Henning Kamp <phk@FreeBSD.ORG>, freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: i386/5398
Date: Sat, 2 May 1998 08:58:15 -0700 (PDT)

 :> State-Changed-Why: 
 :> I think your machine simply is out of steam.  Blitting 3 mbyte around is
 :> going to take quite a bit of your cpu and bus bandwidth.
 :
 :actually, this is probably due to a device that isn't with-in the spec'd
 :rs-232 speeds...  I have the same problem with the ricochet on my
 :notebook... I can't run faster than 19.2kbps because the ricochet runs
 :about 5% over the 19.2kbps clock (I get about 2050int/sec at 19.2kbps)..
 :
 :try running the speeds lower...
 
     Ahh.... nonsense.  Unless the serial port is out of sync more then 1/2 
     bit time every 10 bits, the 16x oversampling done by the 16xxx has no
     problem dealing with this issue.  Also since the problem is demonstratably
     related to the hardware FIFO's water marks, it obviously has nothing to
     do with synchronous/frequency/clock problems and just as obviously has
     everything to do with interrupt latencies.
 
     cpu out of suds?  This is a pentium pro 200.  More likely, there is a 
     serious interrupt disablement latency somewhere in the kernel.
 
 					-Matt
 
 :-- 
 :  John-Mark Gurney                      Modem Rev/FAX: +1 541 346 9237
 :  Cu Networking					  P.O. Box 5693, 97405
 :
 :  Live in Peace, destroy Micro$oft, support free software, run FreeBSD
 :	    Don't trust anyone you don't have the source for
 :
 
     Matthew Dillon   Engineering, BEST Internet Communications, Inc.
 		     <dillon@backplane.com>
     [always include a portion of the original email in any response!]

From: Poul-Henning Kamp <phk@critter.freebsd.dk>
To: Matthew Dillon <dillon@backplane.com>
Cc: John-Mark Gurney <gurney_j@efn.org>, freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: i386/5398 
Date: Sat, 02 May 1998 18:23:42 +0200

 In message <199805021558.IAA00276@apollo.backplane.com>, Matthew Dillon writes:
 >:> State-Changed-Why: 
 >:> I think your machine simply is out of steam.  Blitting 3 mbyte around is
 >:> going to take quite a bit of your cpu and bus bandwidth.
 >:
 >
 >    cpu out of suds?  This is a pentium pro 200.  More likely, there is a 
 >    serious interrupt disablement latency somewhere in the kernel.
 
 What graphics card is this ?  Is the blt done in HW when you move the
 window or by the CPU over the PCI bus ?
 
 --
 Poul-Henning Kamp             FreeBSD coreteam member
 phk@FreeBSD.ORG               "Real hackers run -current on their laptop."
 "ttyv0" -- What UNIX calls a $20K state-of-the-art, 3D, hi-res color terminal

From: Matthew Dillon <dillon@backplane.com>
To: Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc: John-Mark Gurney <gurney_j@efn.org>, freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: i386/5398 
Date: Sat, 2 May 1998 09:40:35 -0700 (PDT)

 :>    cpu out of suds?  This is a pentium pro 200.  More likely, there is a 
 :>    serious interrupt disablement latency somewhere in the kernel.
 :
 :What graphics card is this ?  Is the blt done in HW when you move the
 :window or by the CPU over the PCI bus ?
 
     At the time the bug report was submitted, it was a matrox mystique
     running with the latest XFree86 (so probably some hardware blit).
 
     But it doesn't matter if the blit is done in software or not, frankly,
     because interrupts had better be enabled at the time the X server does
     a software blit (since it's a user mode process), and a hardware
     blit in the matrox shouldn't stall a pci write to the board
     for more then a few microseconds anyway before the pentium can unstall
     and take the serial interrupt.
 
 						-Matt
 
 :--
 :Poul-Henning Kamp             FreeBSD coreteam member
 :phk@FreeBSD.ORG               "Real hackers run -current on their laptop."
 :"ttyv0" -- What UNIX calls a $20K state-of-the-art, 3D, hi-res color terminal
 :
 
     Matthew Dillon   Engineering, BEST Internet Communications, Inc.
 		     <dillon@backplane.com>
     [always include a portion of the original email in any response!]

From: Poul-Henning Kamp <phk@critter.freebsd.dk>
To: Matthew Dillon <dillon@backplane.com>
Cc: John-Mark Gurney <gurney_j@efn.org>, freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: i386/5398 
Date: Sat, 02 May 1998 23:27:25 +0200

 In message <199805021640.JAA00443@apollo.backplane.com>, Matthew Dillon writes:
 >:>    cpu out of suds?  This is a pentium pro 200.  More likely, there is a 
 >:>    serious interrupt disablement latency somewhere in the kernel.
 >:
 >:What graphics card is this ?  Is the blt done in HW when you move the
 >:window or by the CPU over the PCI bus ?
 >
 >    At the time the bug report was submitted, it was a matrox mystique
 >    running with the latest XFree86 (so probably some hardware blit).
 >
 >    But it doesn't matter if the blit is done in software or not, frankly,
 >    because interrupts had better be enabled at the time the X server does
 >    a software blit (since it's a user mode process), and a hardware
 >    blit in the matrox shouldn't stall a pci write to the board
 >    for more then a few microseconds anyway before the pentium can unstall
 >    and take the serial interrupt.
 
 Any description making such blatant use of the word "should" about
 PC hardware in a FreeBSD PR is reason enough to summarily close
 it.  Both Bruce and I seem to lean to bus and/or cpu contention as
 the explanation of your problem.
 
 You havn't even bothered to tell us which interrupt the serial port
 was on, which chipset you have, what graphics card, if it had irq9
 enables, which diskcontroller, if you were swapping (well, you did
 mention Netscape, so I guess you do) but you get the drift...
 
 Until you have evidence of something else being the cause, the PR 
 remains closed.
 
 Please take a detailed look at the Intel SMP document, which spends
 a good deal of text on how interrupts work in PC's, then next look
 at the databook for a modern chipset, and see how long and non-
 predictable the interrupt-chain is.  Then tell "should", "should",
 "should" again.
 
 I can tell you from personal research that busmatering on PCI is
 an extreemely evil thing to do to your interrupt latency. (See
 this page for an example: http://phk.freebsd.dk/rover.html)
 
 PC hardware sucks, but it is cheap... :-/
 
 --
 Poul-Henning Kamp             FreeBSD coreteam member
 phk@FreeBSD.ORG               "Real hackers run -current on their laptop."
 "ttyv0" -- What UNIX calls a $20K state-of-the-art, 3D, hi-res color terminal

From: Matthew Dillon <dillon@backplane.com>
To: Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc: John-Mark Gurney <gurney_j@efn.org>, freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: i386/5398 
Date: Sat, 2 May 1998 22:34:59 -0700 (PDT)

     Look guys, I'm not going to argue... I have worse things to worry about
     then a bug report relating to a digital camera interface.  But I am
     extremely disappointed that you are taking such a cavalier attitude
     towards it and making assumptions that, in my view, make little or no 
     sense just to close out the report.  I'm not an idiot... I've been 
     writing operating systems and doing digital hardware design for over 
     15 years, so don't fraggin quote databooks at me and expect to get
     away with it.
 
     There's only one way to solve this problem and, unfortunately, I don't
     have time to do it... and that is to add debugging code to the kernel
     to do a histogram of the PC of the pushed context at the beginning of the
     core interrupt code to see which diasble/enable pair is causing the
     greatest number of latency problems.  But, as I said, I don't have the
     time to do it myself.
 
     At the very least, don't close bug reports that I spent hours
     putting together on an 'assumption'.  It's annoying and insulting. 
     It is also annoying and insulting to start quoting missing hardware
     configurations at me... you know damn well that if every bug report 
     had the hardware configuration detail down to the last transistor they
     would be mostly useless bug reports because the FreeBSD crew would be
     spending more time trying to sift through the garbage to locate the
     meat.  This bug report was pretty straight forward.  The PCI bus may not
     be the quickest thing in the world, but 8 microseconds times 8 character
     slots is still a hellofalong time to start blaming it for interrupt 
     latencies that cause a serial FIFO to overflow.
 
 						-Matt
 
 :Poul-Henning Kamp             FreeBSD coreteam member
 :phk@FreeBSD.ORG               "Real hackers run -current on their laptop."
 :"ttyv0" -- What UNIX calls a $20K state-of-the-art, 3D, hi-res color terminal
 :
 
     Matthew Dillon   Engineering, BEST Internet Communications, Inc.
 		     <dillon@backplane.com>
     [always include a portion of the original email in any response!]
State-Changed-From-To: closed->open 
State-Changed-By: jkh 
State-Changed-When: Mon May 4 02:06:58 PDT 1998 
State-Changed-Why:  
This is still a problem which needs to be addressed (I see it also 
under the same circumstances, _with or without X running_) and thus 
this PR needs to stay OPEN. 
State-Changed-From-To: open->feedback 
State-Changed-By: phk 
State-Changed-When: Mon May 4 05:16:13 PDT 1998 
State-Changed-Why:  
We need details: 
boot -v dmesg 
kernel config 
protocol details for camera, packet sizes, flowcontrol &c &c 
Responsible-Changed-From-To: freebsd-bugs->dillon 
Responsible-Changed-By: asmodai 
Responsible-Changed-When: Sat Feb 5 05:37:00 PST 2000 
Responsible-Changed-Why:  
Let dillon take care of his own PR's. 
State-Changed-From-To: feedback->closed 
State-Changed-By: dillon 
State-Changed-When: Tue Mar 13 23:26:32 PST 2001 
State-Changed-Why:  
Closed because I don't care any more :-) 

http://www.freebsd.org/cgi/query-pr.cgi?pr=5398 
>Unformatted:
