From nobody@FreeBSD.ORG Thu Jun 24 12:39:22 1999
Return-Path: <nobody@FreeBSD.ORG>
Received: by hub.freebsd.org (Postfix, from userid 32767)
	id 9C9F2153CB; Thu, 24 Jun 1999 12:39:22 -0700 (PDT)
Message-Id: <19990624193922.9C9F2153CB@hub.freebsd.org>
Date: Thu, 24 Jun 1999 12:39:22 -0700 (PDT)
From: schuerge@cs.uni-sb.de
Sender: nobody@FreeBSD.ORG
To: freebsd-gnats-submit@freebsd.org
Subject: Bad scheduling in FreeBSD
X-Send-Pr-Version: www-1.0

>Number:         12381
>Category:       kern
>Synopsis:       Bad scheduling in FreeBSD
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bde
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Jun 24 12:40:00 PDT 1999
>Closed-Date:    Thu Nov 15 07:44:07 PST 2001
>Last-Modified:  Thu Nov 15 07:45:23 PST 2001
>Originator:     Thomas Schrger
>Release:        FreeBSD 4.0-CURRENT
>Organization:
Universitt Saarbrcken, Germany
>Environment:
FreeBSD starfire.heim-d.uni-sb.de 4.0-CURRENT FreeBSD 4.0-CURRENT #0: Tue Jun 15 13:36:03 CEST 1999     schuerge@starfire.heim-d.uni-sb.de:/usr/src/sys/compile/STARFIRE  i386

>Description:
FreeBSD does not seem to schedule processes properly. When starting
two CPU-intensive processes, one running with nice-level 0, the other
with nice-level 20, the first process will get 66%, the second one
33% of the available CPU time, so a 2:1 time-slicing is done. This
can easily be verified by starting two Perl scripts containing
just an endless loop and renicing one of them to +20. "top" will
display what's written above. This cannot be a bug of "top", as the
process' used CPU times increase in the same way.

Other Unix variants do a much better time-slicing. I've tested the
same on a fast Solaris machine and Solaris does a 10:1 slicing, that
is 9% for the background and 91% for the foreground process, which
is more desirable. It seems that renicing a process does not have a lot
of effect in FreeBSD.

>How-To-Repeat:
The problem can be verified by starting two CPU-intensive processes,
renicing one to +20 and watching the "top" output.

>Fix:


>Release-Note:
>Audit-Trail:

From: Mike Pritchard <mpp@mpp.pro-ns.net>
To: schuerge@cs.uni-sb.de
Cc: freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: kern/12381: Bad scheduling in FreeBSD
Date: Thu, 24 Jun 1999 15:30:14 -0500 (CDT)

 > 
 > >Number:         12381
 > >Category:       kern
 > >Synopsis:       Bad scheduling in FreeBSD
 > >Confidential:   no
 > >Severity:       serious
 > >Priority:       medium
 > >Responsible:    freebsd-bugs
 > >State:          open
 > >Quarter:        
 > >Keywords:       
 > >Date-Required:
 > >Class:          sw-bug
 > >Submitter-Id:   current-users
 > >Arrival-Date:   Thu Jun 24 12:40:00 PDT 1999
 > >Closed-Date:
 > >Last-Modified:
 > >Originator:     Thomas Schrger
 > >Release:        FreeBSD 4.0-CURRENT
 > >Organization:
 > Universitt Saarbrcken, Germany
 > >Environment:
 > FreeBSD starfire.heim-d.uni-sb.de 4.0-CURRENT FreeBSD 4.0-CURRENT #0: Tue Jun 15 13:36:03 CEST 1999     schuerge@starfire.heim-d.uni-sb.de:/usr/src/sys/compile/STARFIRE  i386
 > 
 > >Description:
 >...
 > is more desirable. It seems that renicing a process does not have a lot
 > of effect in FreeBSD.
 > 
 > >How-To-Repeat:
 > The problem can be verified by starting two CPU-intensive processes,
 > renicing one to +20 and watching the "top" output.
 
 Under -current:
 I just played around with this a little bit, and if I have just 1 nice 0
 cpu hog, and 1 or 2 nice 20 hogs, the nice 0 job gets about 1/2 of
 the cpu, and the two nice jobs get around 40 - 45% of the cpu.
 As I keep adding cpu hog nice 0 jobs, the nice 20 jobs get about 1/2 of the
 cpu as they did before.  If I have 3 nice 0 jobs running, the nice 20
 jobs get around zero % of the cpu, which is what I would expect.
 
 The man page for renice states that nice 20 jobs should only get 
 the cpu when nothing else wants it, which is not happening on lightly
 loaded systems.  Whether this is intentional or not, I don't know.
 One of our scheduling gurus will have to answer that.
 -- 
 Mike Pritchard
 mpp@FreeBSD.ORG or mpp@mpp.pro-ns.net
 
State-Changed-From-To: open->feedback 
State-Changed-By: sheldonh 
State-Changed-When: Fri Jun 25 04:27:12 PDT 1999 
State-Changed-Why:  
Be careful when defining a compute-bound processes. You need to keep in 
mind that if a process sleeps or blocks during its time slice, you must 
expect that someone else will get the cpu -- at some point the process 
with the high nice value _is_ going to get a time slice. 

You should also keep in mind that FreeBSD (BSD UNIX in general) isn't  
optimized for managing two processes. Very few real-world scenarios 
require such optimization.  It's optimized for the management of many   
processes.  

From: Sheldon Hearn <sheldonh@uunet.co.za>
To: Thomas Schuerger <schuerge@wjpserver.CS.Uni-SB.DE>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: kern/12381: Bad scheduling in FreeBSD 
Date: Fri, 25 Jun 1999 15:31:25 +0200

 On Fri, 25 Jun 1999 14:47:54 +0200, Thomas Schuerger wrote:
 
 > Not quite. The second PR states that FreeBSD uses a strange 2:1
 > time-slicing when having two CPU-intensive processes, one in the
 > foreground and one in the background. In the first PR, I stated that
 > also any I/O is very much affected by long-runners in the background,
 > so that even processes not requiring much CPU-time but depending on
 > I/O (network, disks) are affected.
 
 So surely the "problem" in the first PR would "go away" if the "problem" in
 the second were resolved?
 
 By the way, when you did your comparison with Solaris, were you watching
 CPU time spent in system?
 
 Ciao,
 Sheldon.
 
State-Changed-From-To: feedback->open 
State-Changed-By: sheldonh 
State-Changed-When: Fri Jun 25 08:14:17 PDT 1999 
State-Changed-Why:  
Feedback acquired. 

From: "Jose M. Alcaide" <jose@we.lc.ehu.es>
To: Thomas Schuerger <schuerge@wjpserver.CS.Uni-SB.DE>
Cc: sheldonh@FreeBSD.ORG, schuerge@cs.uni-sb.de,
	freebsd-bugs@FreeBSD.ORG
Subject: Re: kern/12381: Bad scheduling in FreeBSD
Date: Fri, 25 Jun 1999 16:43:22 +0200

 Thomas Schuerger wrote:
 >=20
 > > State-Changed-From-To: open->feedback
 > > State-Changed-By: sheldonh
 > > State-Changed-When: Fri Jun 25 04:27:12 PDT 1999
 > > State-Changed-Why:
 > > Be careful when defining a compute-bound processes. You need to keep =
 in
 > > mind that if a process sleeps or blocks during its time slice, you mu=
 st
 > > expect that someone else will get the cpu -- at some point the proces=
 s
 > > with the high nice value _is_ going to get a time slice.
 > >
 > > You should also keep in mind that FreeBSD (BSD UNIX in general) isn't
 > > optimized for managing two processes. Very few real-world scenarios
 > > require such optimization.  It's optimized for the management of many
 > > processes.
 >=20
 > What I was saying in general is, that FreeBSD's performance drops
 > drastically, if a CPU-intensive process is running in the background.
 > I stated that it mostly affects FreeBSD's I/O performance, which is
 > a problem that other Unix variants don't have (at least not as
 > noticably as with FreeBSD). It would require to take a closer look
 > at how the scheduling is done and maybe a complete rework of that part
 > of the OS.
 >=20
 > [snip]
 
 Here we are using several FreeBSD systems for running CPU-intensive
 processes (now including some "setiathome's" ;-) ). All these processes
 run with nice 20, and their impact in general system performance is
 very low. In other words, we are not experiencing that performance
 degradation. Of course, a process which is CPU-bound and also a memory
 hog has a noticeable impact on performance (due to paging and swapping).
 
 However, what I see is that the nice number has little influence on
 the priority of CPU-bound processes. I think that is due to the way
 4.4BSD uses for computing the instant scheduling priority: the recent
 CPU usage causes a quick degradation of priority. Then, two CPU-intensive
 processes, one running with nice 5, and another with nice 20, will
 have the same scheduling priority a few seconds after they start.
 This does not happen with other UNIXes; for example, two identical
 processes running with nice 9 and 19 on Solaris, get the 65% and 30%
 of CPU respectively. Using FreeBSD, both processes get the 50% of
 the CPU.
 
 -- JMA
 -----------------------------------------------------------------------
 Jos=E9 M=AA Alcaide                         | mailto:jose@we.lc.ehu.es
 Universidad del Pa=EDs Vasco              | mailto:jmas@es.FreeBSD.ORG
 Dpto. de Electricidad y Electr=F3nica     | http://www.we.lc.ehu.es/~jose
 Facultad de Ciencias - Campus de Lejona | Tel.:  +34-946012479
 48940 Lejona (Vizcaya) - SPAIN          | Fax:   +34-946013071
 -----------------------------------------------------------------------
                "Go ahead... make my day." - H. Callahan
 
 

From: Thomas Schuerger <schuerge@wjpserver.CS.Uni-SB.DE>
To: Sheldon Hearn <sheldonh@uunet.co.za>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: kern/12381: Bad scheduling in FreeBSD
Date: Fri, 25 Jun 1999 17:26:14 +0200 (MET DST)

 > > Not quite. The second PR states that FreeBSD uses a strange 2:1
 > > time-slicing when having two CPU-intensive processes, one in the
 > > foreground and one in the background. In the first PR, I stated that
 > > also any I/O is very much affected by long-runners in the background,
 > > so that even processes not requiring much CPU-time but depending on
 > > I/O (network, disks) are affected.
 > 
 > So surely the "problem" in the first PR would "go away" if the "problem" in
 > the second were resolved?
 
 Well, it may or may not be the case. I don't know.
 
 > By the way, when you did your comparison with Solaris, were you watching
 > CPU time spent in system?
 
 Yes, the used CPU time advances in the same way as the CPU time
 percentage suggests.
 
 
 Ciao,
 Thomas.
 
 
 

From: "Jose M. Alcaide" <jose@we.lc.ehu.es>
To: Thomas Schuerger <schuerge@wjpserver.CS.Uni-SB.DE>
Cc: sheldonh@FreeBSD.ORG, schuerge@cs.uni-sb.de,
	freebsd-bugs@FreeBSD.ORG
Subject: Re: kern/12381: Bad scheduling in FreeBSD
Date: Fri, 25 Jun 1999 18:06:17 +0200

 Thomas Schuerger wrote:
 >=20
 > > Here we are using several FreeBSD systems for running CPU-intensive
 > > processes (now including some "setiathome's" ;-) ). All these process=
 es
 > > run with nice 20, and their impact in general system performance is
 > > very low. In other words, we are not experiencing that performance
 > > degradation. Of course, a process which is CPU-bound and also a memor=
 y
 > > hog has a noticeable impact on performance (due to paging and swappin=
 g).
 >=20
 > Well, please do a test that transfers heavily over the network or
 > that does a lot of disk I/O, once when setathome is running and once
 > when it's not. Heavy disk I/O will also be slower, try updating your
 > ports or your source tree via cvsup and measure times to do so.
 >=20
 
 I tried a ftp (1.7 MB):
 
 With system idle:
 1755377 bytes received in 1.97 seconds (872.12 KB/s)
 
 With setiathome running with nice 20:
 1755377 bytes received in 1.84 seconds (930.99 KB/s)
 
 (Obviously the difference comes from the network load).
 
 And also a "tar xzf ports.tar.gz":
 
 With system idle:
 real    5m31.084s
 user    0m3.413s
 sys     0m21.317s
 
 With setiathome running with nice 20:
 real    5m59.629s
 user    0m4.163s
 sys     0m24.460s
 
 I did the tests using FreeBSD 3.2-RELEASE, running on a PentiumII-350,
 with an UltraDMA disk (using 0xa0ff flags and softupdates), and an
 Intel Etherexpress Pro/100 network card.
 
 -- JMA
 -----------------------------------------------------------------------
 Jos=E9 M=AA Alcaide                         | mailto:jose@we.lc.ehu.es
 Universidad del Pa=EDs Vasco              | mailto:jmas@es.FreeBSD.ORG
 Dpto. de Electricidad y Electr=F3nica     | http://www.we.lc.ehu.es/~jose
 Facultad de Ciencias - Campus de Lejona | Tel.:  +34-946012479
 48940 Lejona (Vizcaya) - SPAIN          | Fax:   +34-946013071
 -----------------------------------------------------------------------
                "Go ahead... make my day." - H. Callahan
 
 

From: Sheldon Hearn <sheldonh@uunet.co.za>
To: Thomas Schuerger <schuerge@wjpserver.CS.Uni-SB.DE>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: kern/12381: Bad scheduling in FreeBSD 
Date: Fri, 25 Jun 1999 19:42:05 +0200

 On Fri, 25 Jun 1999 19:40:29 +0200, Thomas Schuerger wrote:
 
 > The tests I did may not be the best ones to choose, but they ARE
 > real-life scenarios. And I DID an FTP test, demonstrating that
 > network transfer speed drops by about 25%. I would be glad if you
 > had some suggestions about what tests I could do.
 
 I just wonder if there isn't something _else_ at play that makes your
 results so different from Jose's.
 
 Ciao,
 Sheldon.
 

From: Sheldon Hearn <sheldonh@uunet.co.za>
To: Thomas Schuerger <schuerge@wjpserver.CS.Uni-SB.DE>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: kern/12381: Bad scheduling in FreeBSD 
Date: Fri, 25 Jun 1999 20:24:32 +0200

 Having gone over this conversation a few times, I think I understand why
 we're not connecting. You're coming from the perspective of someone who
 wants processes nice'd to 20 to get out of the way of compute-bound
 processes at nice <20.
 
 I'm coming from the perspective of someone who thinks FreeBSD does a
 good job of sharing resources amongst multiple processes. The problem is
 that it's exactly this that you're complaining about. It's the fact that
 FreeBSD distributes CPU amongst processes using priority weightings and
 decaying load average that's upsetting you.
 
 Basically, you want renice 20 pid to cause the affected pid to be
 allowed as close to no CPU time as possible while there are
 compute-bound processes at nice <20 running.
 
 Is this right?
 
 Ciao,
 Sheldon/
 

From: Thomas Schuerger <schuerge@wjpserver.CS.Uni-SB.DE>
To: Sheldon Hearn <sheldonh@uunet.co.za>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: kern/12381: Bad scheduling in FreeBSD
Date: Fri, 25 Jun 1999 20:35:29 +0200 (MET DST)

 > Having gone over this conversation a few times, I think I understand why
 > we're not connecting. You're coming from the perspective of someone who
 > wants processes nice'd to 20 to get out of the way of compute-bound
 > processes at nice <20.
 > 
 > I'm coming from the perspective of someone who thinks FreeBSD does a
 > good job of sharing resources amongst multiple processes. The problem is
 > that it's exactly this that you're complaining about. It's the fact that
 > FreeBSD distributes CPU amongst processes using priority weightings and
 > decaying load average that's upsetting you.
 > 
 > Basically, you want renice 20 pid to cause the affected pid to be
 > allowed as close to no CPU time as possible while there are
 > compute-bound processes at nice <20 running.
 > 
 > Is this right?
 
 Yes, that's what I meant by bad scheduling.
 
 I think the nice-level only has a very minor effect in FreeBSD, which
 is not what I expect in a Unix environment. What it means is that there is
 no possibility to really run processes "in the background", e.g. sort of
 in the "spare time" of the CPU (these processes still should become active every
 now and then when more important processes are running, completely cutting them
 off would not be what I'd want). Something like an exponential drop off with
 increasing nice-levels would be fine, for example.
 
 
 Ciao,
 Thomas.
  
 

From: Sheldon Hearn <sheldonh@uunet.co.za>
To: freebsd-gnats-submit@freebsd.org
Cc:  
Subject: Re: kern/12381: Bad scheduling in FreeBSD
Date: Sat, 26 Jun 1999 07:58:51 +0200

 I chatted to bde, who advised me to leave this open, with the following
 reference:
 
 http://docs.freebsd.org/cgi/getmsg.cgi?fetch=568755+0+archive/1999/freebsd-bugs/19990307.freebsd-bugs
 
 Ciao,
 Sheldon.
 
State-Changed-From-To: open->analyzed 
State-Changed-By: bde 
State-Changed-When: Sat Jul 10 07:57:58 PDT 1999 
State-Changed-Why:  
This may be fixed in rev.1.23 of intr_machdep.c.  Is it? 
The reason that not everyone sees the bug in intr_machdep.c is that 
it only affects systems with shared interrupts. 
State-Changed-From-To: analyzed->feedback 
State-Changed-By: sheldonh 
State-Changed-When: Mon Jul 19 03:01:38 PDT 1999 
State-Changed-Why:  
Although the problem may have been analyzed, we're actually waiting for 
feedback from the submitter. 
State-Changed-From-To: feedback->closed 
State-Changed-By: sheldonh 
State-Changed-When: Mon Jul 19 04:56:34 PDT 1999 
State-Changed-Why:  
Fixed in rev 1.23 of sys/i386/isa/intr_machdep.c . 


Responsible-Changed-From-To: freebsd-bugs->bde 
Responsible-Changed-By: sheldonh 
Responsible-Changed-When: Mon Jul 19 04:56:34 PDT 1999 
Responsible-Changed-Why:  
Bruce's fix. 

From: Thomas Schuerger <schuerge@wjpserver.CS.Uni-SB.DE>
To: sheldonh@FreeBSD.org
Cc: schuerge@cs.uni-sb.de, freebsd-bugs@FreeBSD.org
Subject: Re: kern/12381: Bad scheduling in FreeBSD
Date: Mon, 19 Jul 1999 13:45:51 +0200 (MET DST)

 > Synopsis: Bad scheduling in FreeBSD
 > 
 > State-Changed-From-To: analyzed->feedback
 > State-Changed-By: sheldonh
 > State-Changed-When: Mon Jul 19 03:01:38 PDT 1999
 > State-Changed-Why: 
 > Although the problem may have been analyzed, we're actually waiting for
 > feedback from the submitter.
 
 I made world some days ago and the scheduling problems seem to have
 been fixed (my update before that was some weeks ago). Now
 long-running processes in the background do not
 interfere and the I/O problems are gone, too...
 
 Good job! What has been changed?
 
 
 Ciao,
 Thomas.
 
 
 
State-Changed-From-To: closed->suspended 
State-Changed-By: sheldonh 
State-Changed-When: Mon Nov 29 03:21:58 PST 1999 
State-Changed-Why:  
Fixed as per NetBSD in: 
rev 1.91 src/sys/kern/kern_exit.c 
rev 1.83 src/sys/kern/kern_synch.c 
rev 1.95 src/sys/sys/proc.h 
Suspended until this is fixed in STABLE. 
State-Changed-From-To: suspended->closed 
State-Changed-By: asmodai 
State-Changed-When: Thu Nov 15 07:44:07 PST 2001 
State-Changed-Why:  
This has been fixed in kern_exit.c 1.91 and kern_synch.c 1.83. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=12381 
>Unformatted:
