From nobody@FreeBSD.org  Fri Nov 26 13:18:40 2010
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D9265106566B
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 26 Nov 2010 13:18:40 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (unknown [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id 9764A8FC08
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 26 Nov 2010 13:18:40 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id oAQDIe4o040521
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 26 Nov 2010 13:18:40 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id oAQDIeHh040520;
	Fri, 26 Nov 2010 13:18:40 GMT
	(envelope-from nobody)
Message-Id: <201011261318.oAQDIeHh040520@red.freebsd.org>
Date: Fri, 26 Nov 2010 13:18:40 GMT
From: Mykola Zubach <zuborg@gmail.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: scheduler issue - cpu overusage by 'intr' kernel thread
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         152599
>Category:       kern
>Synopsis:       [scheduler] scheduler issue - cpu overusage by 'intr' kernel thread
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Nov 26 13:20:06 UTC 2010
>Closed-Date:    Tue Nov 13 20:56:39 UTC 2012
>Last-Modified:  Tue Nov 13 20:56:39 UTC 2012
>Originator:     Mykola Zubach
>Release:        8.1-RELEASE
>Organization:
AdvancedHosters.com
>Environment:
FreeBSD DS1102 8.1-RELEASE-p1 FreeBSD 8.1-RELEASE-p1 #0: Mon Oct 18 11:31:13 UTC 2010     root@DS1124:/usr/obj/usr/src/sys/Z-AMD64  amd64
>Description:
Compare this system load

# cpuset -g -p 11
pid 11 mask: 0, 1, 2, 3
# cpuset -g -p 39283
pid 39283 mask: 0, 1, 2, 3

last pid: 46295;  load averages:  0.02,  0.03,  0.02     up 20+05:19:55  12:56:31
126 processes: 5 running, 95 sleeping, 26 waiting
CPU:  2.7% user,  0.0% nice,  6.9% system,  3.4% interrupt, 86.9% idle
Mem: 112M Active, 13G Inact, 2114M Wired, 486M Cache, 1645M Buf, 96M Free
Swap: 2048M Total, 72K Used, 2048M Free
  PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   10 root     171 ki31     0K    64K CPU2    2 438.6H 99.66% {idle: cpu2}
   10 root     171 ki31     0K    64K RUN     0 332.7H 96.48% {idle: cpu0}
   10 root     171 ki31     0K    64K RUN     3 431.1H 91.06% {idle: cpu3}
   10 root     171 ki31     0K    64K CPU1    1 444.4H 84.67% {idle: cpu1}
39283 www       53    0 78948K 72184K kqread  1   1:15 20.31% nginx
   11 root     -44    -     0K   416K CPU0    0 159.4H  5.81% {swi1: netisr 0}

and this one:

# cpuset -g -p 11
pid 11 mask: 0
# cpuset -g -p 39283
pid 39283 mask: 0

last pid: 47792;  load averages:  1.12,  0.78,  0.59     up 20+05:26:59  13:03:35
132 processes: 8 running, 100 sleeping, 24 waiting
CPU:  0.3% user,  0.0% nice,  2.1% system,  0.4% interrupt, 97.2% idle
Mem: 115M Active, 13G Inact, 2121M Wired, 649M Cache, 1644M Buf, 96M Free
Swap: 2048M Total, 80K Used, 2048M Free
  PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   10 root     171 ki31     0K    64K CPU0    0 332.7H 100.00% {idle: cpu0}
   10 root     171 ki31     0K    64K CPU1    1 444.6H 99.37% {idle: cpu1}
   10 root     171 ki31     0K    64K CPU3    3 431.3H 98.44% {idle: cpu3}
   10 root     171 ki31     0K    64K RUN     2 438.8H 98.10% {idle: cpu2}
39283 www       46    0 84068K 76412K kqread  0   2:17  3.61% nginx
    3 root      -8    -     0K    16K -       3 189:16  0.83% g_up
   11 root     -44    -     0K   416K WAIT    0 159.5H  0.49% {swi1: netisr 0}
    4 root      -8    -     0K    16K -       3 102:07  0.24% g_down
    6 root      44    -     0K    16K psleep  3  84:08  0.10% pagedaemon
   19 root      -8    -     0K    16K m:w1    2  76:08  0.05% g_mirror gm1

Cpu usage is significantly lower in second case (while loadaverage is higher).
The only difference is cpu binding of processes 11 and 39283.

It looks to be cpu cache coherence issue.
I'm not sure is it possible improve scheduling in general case or not, but it looks to be more efficient to bind at least 'intr' kernel process to some single core and, may be, do some special scheduling for processes in 'kqread' state.

Bandwidth is 650+650=1300Mbit/s total at the moment (there are em0 and em1 on the server, both with POLLING).
Nginx uses kqueue,aio+sendfile, kernel is build with ZERO_SOCKETS.

On some servers binding of pid 11 to cpu0 reduces cpu usage from 40-80% to several percents.
>How-To-Repeat:

>Fix:


>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback 
State-Changed-By: arundel 
State-Changed-When: Fri Jul 8 18:11:22 UTC 2011 
State-Changed-Why:  
Could you try to verify, whether this issue exists under sched_ule or sched_4bsd 
or both? Possible with PREEMPTION disabled and enabled? 

Thanks. 
Alex 

http://www.freebsd.org/cgi/query-pr.cgi?pr=152599 
State-Changed-From-To: feedback->closed 
State-Changed-By: eadler 
State-Changed-When: Tue Nov 13 20:56:38 UTC 2012 
State-Changed-Why:  
some work was done recently in this area. if this is still a problem 
reply and let me know 

http://www.freebsd.org/cgi/query-pr.cgi?pr=152599 
>Unformatted:
