From eugen@eg.sd.rdtc.ru  Sat Sep 29 17:20:12 2012
Return-Path: <eugen@eg.sd.rdtc.ru>
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 366EA1065673;
	Sat, 29 Sep 2012 17:20:12 +0000 (UTC)
	(envelope-from eugen@eg.sd.rdtc.ru)
Received: from eg.sd.rdtc.ru (eg.sd.rdtc.ru [IPv6:2a03:3100:c:13::5])
	by mx1.freebsd.org (Postfix) with ESMTP id D82AE8FC15;
	Sat, 29 Sep 2012 17:20:09 +0000 (UTC)
Received: from eg.sd.rdtc.ru (localhost [127.0.0.1])
	by eg.sd.rdtc.ru (8.14.5/8.14.5) with ESMTP id q8THK6a8031878;
	Sun, 30 Sep 2012 00:20:06 +0700 (NOVT)
	(envelope-from eugen@eg.sd.rdtc.ru)
Received: (from eugen@localhost)
	by eg.sd.rdtc.ru (8.14.5/8.14.5/Submit) id q8THK0eT031877;
	Sun, 30 Sep 2012 00:20:00 +0700 (NOVT)
	(envelope-from eugen)
Message-Id: <201209291720.q8THK0eT031877@eg.sd.rdtc.ru>
Date: Sun, 30 Sep 2012 00:20:00 +0700 (NOVT)
From: Eugene Grosbein <egrosbein@rdtc.ru>
Reply-To: Eugene Grosbein <eugen@eg.sd.rdtc.ru>
To: FreeBSD-gnats-submit@freebsd.org
Cc: jhb@freebsd.org, kib@freebsd.org
Subject: Deadlock in the networking code, possible due to a bug in the SCHED_ULE
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         172166
>Category:       kern
>Synopsis:       Deadlock in the networking code, possible due to a bug in the SCHED_ULE
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Sep 29 17:30:01 UTC 2012
>Closed-Date:    
>Last-Modified:  Mon Mar 25 14:40:00 UTC 2013
>Originator:     Eugene Grosbein
>Release:        FreeBSD 8.3-STABLE amd64
>Organization:
RDTC JSC
>Environment:
System: FreeBSD 8.3-STABLE/amd64, six-core Intel X5675 CPU (hyperthreading disabled).

>Description:

	I run pretty busy FreeBSD 8.3-STABLE/amd64 based mpd-5.6 PPPoE server
	that serves hundreds (and sometimes over a thousand) simultaneous connections
	with high connect/disconnect rate. Also, it sends its logs
	to remote syslog collector over the net. It also heavily uses
	ipfw tables for dummynet shaping: every new connected client
	obtains its IP and this IP is added by mpd to some of ipfw tables.
	Upon disconnection, mpd removes that IP from tables.

	Today, my server deadlocked second time in two months:
	all of its network activity got blocked, even lagg's LACP frames.
	The kernel and userland were fine, I managed to login using IP KVM console.

	I have invoked KDB, did 'call doadump', obtained crashdump and rebooted the box.

	I've digged it a little; it seems syslogd(8) was preempted by the scheduler
	in the middle of ipfw_lookup_table()/rn_match() sequence while holding
	reader-lock of "IPFW static rules" rwlock and newer got back.
	Hence, all network activity broke as ipfw needs writer-lock of "IPFW static rules".

	Here comes backtrace of syslogd's kernel thread:

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
KDB: enter: Break to debugger
Dumping 1135 out of 4079 MB:..2%..12%..22%..32%..41%..51%..61%..71%..81%..91%

Error while mapping shared library sections:
/boot/kernel/nfsclient.ko:     .
Error while mapping shared library sections:
/boot/kernel/nfslock.ko:     .
Error while mapping shared library sections:
/boot/kernel/nfs_common.ko:     .
Error while mapping shared library sections:
/boot/kernel/krpc.ko:     .
Reading symbols from /boot/kernel/ipmi.ko...done.
Loaded symbols for /boot/kernel/ipmi.ko
Error while reading shared library symbols:
/boot/kernel/nfsclient.ko:     .
Error while reading shared library symbols:
/boot/kernel/nfslock.ko:     .
Error while reading shared library symbols:
/boot/kernel/nfs_common.ko:     .
Error while reading shared library symbols:
/boot/kernel/krpc.ko:     .
#0  doadump () at /home/src/sys/kern/kern_shutdown.c:268
268		if (textdump_pending)
(kgdb) thread 134
[Switching to thread 134 (Thread 100201)]#0  sched_switch (
    td=0xffffff0004e00470, newtd=0xffffff0001b96470, flags=Variable "flags" is not available.
)
    at /home/src/sys/kern/sched_ule.c:1892
1892			cpuid = PCPU_GET(cpuid);
(kgdb) bt
#0  sched_switch (td=0xffffff0004e00470, newtd=0xffffff0001b96470, flags=Variable "flags" is not available.
)
    at /home/src/sys/kern/sched_ule.c:1892
#1  0xffffffff80305c96 in mi_switch (flags=1538, newtd=0x0)
    at /home/src/sys/kern/kern_synch.c:466
#2  0xffffffff803048f5 in critical_exit ()
    at /home/src/sys/kern/kern_switch.c:212
#3  0xffffffff802d5733 in intr_event_handle (ie=0xffffff0001b61100, 
    frame=0xffffff81254122c0) at /home/src/sys/kern/kern_intr.c:1424
#4  0xffffffff804de4ff in intr_execute_handlers (isrc=0xffffff0001b82a00, 
    frame=0xffffff81254122c0) at /home/src/sys/amd64/amd64/intr_machdep.c:260
#5  0xffffffff804e2287 in lapic_handle_intr (vector=Variable "vector" is not available.
)
    at /home/src/sys/x86/x86/local_apic.c:771
#6  0xffffffff804db2b5 in Xapic_isr1 () at apic_vector.S:86
#7  0xffffffff803cdab3 in rn_match (v_arg=0xffffff81254123d0, head=Variable "head" is not available.
)
    at /home/src/sys/net/radix.c:352
#8  0xffffffff8040c03b in ipfw_lookup_table (ch=Variable "ch" is not available.
)
    at /home/src/sys/netinet/ipfw/ip_fw_table.c:538
#9  0xffffffff80405e1f in ipfw_chk (args=0xffffff81254125a0)
    at /home/src/sys/netinet/ipfw/ip_fw2.c:1429
#10 0xffffffff80409d7a in ipfw_check_hook (arg=Variable "arg" is not available.
)
    at /home/src/sys/netinet/ipfw/ip_fw_pfil.c:137
#11 0xffffffff803cd46c in pfil_run_hooks (ph=Variable "ph" is not available.
) at /home/src/sys/net/pfil.c:82
#12 0xffffffff804127ca in ip_output (m=0xffffff00bdb5b000, opt=Variable "opt" is not available.
)
    at /home/src/sys/netinet/ip_output.c:511
#13 0xffffffff8042c115 in udp_send (so=Variable "so" is not available.
)
    at /home/src/sys/netinet/udp_usrreq.c:1249
#14 0xffffffff803674cb in sosend_dgram (so=0xffffff0004e79550, 
    addr=0xffffff010c529060, uio=0xffffff8125412a00, top=0xffffff00bdb5b000, 
    control=0x0, flags=0, td=0xffffff0004e00470)
    at /home/src/sys/kern/uipc_socket.c:1107
#15 0xffffffff8036b7e2 in kern_sendit (td=0xffffff0004e00470, s=7, 
    mp=0xffffff8125412ad0, flags=0, control=0x0, segflg=UIO_USERSPACE)
    at /home/src/sys/kern/uipc_syscalls.c:785
#16 0xffffffff8036ba6c in sendit (td=0xffffff0004e00470, s=7, 
    mp=0xffffff8125412ad0, flags=0) at /home/src/sys/kern/uipc_syscalls.c:717
#17 0xffffffff8036bb5d in sendto (td=Variable "td" is not available.
)
    at /home/src/sys/kern/uipc_syscalls.c:837
#18 0xffffffff804f3554 in amd64_syscall (td=0xffffff0004e00470, traced=0)
    at subr_syscall.c:114
#19 0xffffffff804daefc in Xfast_syscall ()
    at /home/src/sys/amd64/amd64/exception.S:387
#20 0x000000080082be3c in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb)

	Backtraces of all kernel threads are available here:
	http://www.grosbein.net/crash/20120929/misc.tar.xz

	Kernel crashdump is also available at http://www.grosbein.net/crash/20120929/

	Here comes kernel config file:

cpu		HAMMER
ident		PPPOE

# To statically compile in device wiring instead of /boot/device.hints
#hints		"GENERIC.hints"		# Default places to look for devices.

# Use the following to compile in values accessible to the kernel
# through getenv() (or kenv(1) in userland). The format of the file
# is 'variable=value', see kenv(1)
#
# env		"GENERIC.env"

makeoptions	DEBUG=-g		# Build kernel with gdb(1) debug symbols

options 	SCHED_ULE		# ULE scheduler
options 	PREEMPTION		# Enable kernel thread preemption
options 	INET			# InterNETworking
options 	INET6			# IPv6 communications protocols
#options 	SCTP			# Stream Control Transmission Protocol
options 	FFS			# Berkeley Fast Filesystem
options 	SOFTUPDATES		# Enable FFS soft updates support
options 	UFS_ACL			# Support for access control lists
options 	UFS_DIRHASH		# Improve performance on big directories
options 	UFS_GJOURNAL		# Enable gjournal-based UFS journaling
options 	MD_ROOT			# MD is a potential root device
#options 	NFSCLIENT		# Network Filesystem Client
#options 	NFSSERVER		# Network Filesystem Server
#options 	NFSLOCKD		# Network Lock Manager
options 	NFS_ROOT		# NFS usable as /, requires NFSCLIENT
#options 	MSDOSFS			# MSDOS Filesystem
#options 	CD9660			# ISO 9660 Filesystem
#options 	PROCFS			# Process filesystem (requires PSEUDOFS)
options 	PSEUDOFS		# Pseudo-filesystem framework
options 	GEOM_PART_GPT		# GUID Partition Tables.
options 	GEOM_LABEL		# Provides labelization
#options 	GEOM_JOURNAL
options 	COMPAT_43TTY		# BSD 4.3 TTY compat (sgtty)
options 	COMPAT_FREEBSD32	# Compatible with i386 binaries
#options 	COMPAT_FREEBSD4		# Compatible with FreeBSD4
#options 	COMPAT_FREEBSD5		# Compatible with FreeBSD5
#options 	COMPAT_FREEBSD6		# Compatible with FreeBSD6
#options 	COMPAT_FREEBSD7		# Compatible with FreeBSD7
#options 	SCSI_DELAY=5000		# Delay (in ms) before probing SCSI
options 	KTRACE			# ktrace(1) support
options 	STACK			# stack(9) support
options 	SYSVSHM			# SYSV-style shared memory
options 	SYSVMSG			# SYSV-style message queues
options 	SYSVSEM			# SYSV-style semaphores
options 	P1003_1B_SEMAPHORES	# POSIX-style semaphores
options 	_KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions
options 	PRINTF_BUFR_SIZE=512	# Prevent printf output being interspersed.
options 	KBD_INSTALL_CDEV	# install a CDEV entry in /dev
options 	HWPMC_HOOKS		# Necessary kernel hooks for hwpmc(4)
options 	AUDIT			# Security event auditing
options 	MAC			# TrustedBSD MAC Framework
#options	FLOWTABLE		# per-cpu routing cache
#options 	KDTRACE_FRAME		# Ensure frames are compiled in
#options 	KDTRACE_HOOKS		# Kernel DTrace hooks
options 	INCLUDE_CONFIG_FILE     # Include this file in kernel

# Make an SMP-capable kernel by default
options 	SMP			# Symmetric MultiProcessor Kernel

# CPU frequency control
device		cpufreq

# Bus support.
device		acpi
device		pci

# Floppy drives
#device		fdc

# ATA and ATAPI devices
device		ata
device		atadisk		# ATA disk drives
device		atapicd		# ATAPI CDROM drives

# SCSI peripherals
device		scbus		# SCSI bus (required for SCSI)
device		da		# Direct Access (disks)
device		cd		# CD
device		pass		# Passthrough device (direct SCSI access)

# atkbdc0 controls both the keyboard and the PS/2 mouse
device		atkbdc		# AT keyboard controller
device		atkbd		# AT keyboard
device		psm		# PS/2 mouse
device		kbdmux		# keyboard multiplexer
device		vga		# VGA video card driver

# syscons is the default console driver, resembling an SCO console
device		sc

# Serial (COM) ports
device		uart		# Generic UART driver

# PCI Ethernet NICs.
device		em		# Intel PRO/1000 Gigabit Ethernet Family
device		igb

# Pseudo devices.
device		loop		# Network loopback
device		random		# Entropy device
device		ether		# Ethernet support
device		vlan		# 802.1Q VLAN support
device		pty		# BSD-style compatibility pseudo ttys
device		md		# Memory "disks"
device		gif		# IPv6 and IPv4 tunneling
device		faith		# IPv6-to-IPv4 relaying (translation)
device		firmware	# firmware assist module
device		snp
device		bpf		# Berkeley packet filter

# USB support
#options 	USB_DEBUG	# enable debug msgs
#options	USB_VERBOSE
device		uhci		# UHCI PCI->USB interface
device		ehci		# EHCI PCI->USB interface (USB 2.0)
device		usb		# USB Bus (required)
device		ukbd		# Keyboard
device		umass		# Disks/Mass storage - Requires scbus and da
device		ums		# Mouse

device		ucom
# USB support for Prolific PL-2303 serial adapters
device		uplcom
# USB support for Silicon Laboratories CP2101/CP2102 based USB serial adapters
device		uslcom

#options		IPSEC
#device		crypto

options		NETGRAPH
options		NETGRAPH_ETHER
options		NETGRAPH_IFACE
options         NETGRAPH_MPPC_ENCRYPTION
options		NETGRAPH_PPP
options		NETGRAPH_PPPOE
options		NETGRAPH_SOCKET
options		NETGRAPH_TCPMSS
options		NETGRAPH_TEE
options		NETGRAPH_VJC

options		IPFIREWALL
options		IPFIREWALL_FORWARD
options		DUMMYNET

options		VFS_AIO

device		smbus
device		smb
device		ichsmb
device		iicbus
device		iicbb
device		ic
device		iic
device		iicsmb
device		coretemp
device		ichwd
device		nvram

device		lagg

options         KDB
options         KDB_TRACE
options         KDB_UNATTENDED
options		DDB
options		DDB_NUMSYM
#options		NETGRAPH_DEBUG
#options		INVARIANT_SUPPORT
#options		INVARIANTS
#options         DEBUG_MEMGUARD
#options         BREAK_TO_DEBUGGER
options         ALT_BREAK_TO_DEBUGGER

device		bridge

>How-To-Repeat:
	
	Unknown. The problem occures very seldom but today was second time.

>Fix:

	Unknown for me.
>Release-Note:
>Audit-Trail:

From: Andriy Gapon <avg@FreeBSD.org>
To: bug-followup@FreeBSD.org, eugen@eg.sd.rdtc.ru
Cc:  
Subject: Re: kern/172166: Deadlock in the networking code, possible due to
 a bug in the SCHED_ULE
Date: Sun, 30 Sep 2012 14:54:02 +0300

 It looks like CPUs 0 - 4 are idle, but CPU 5 has load of three.
 One of those threads is the syslogd thread that holds the lock, but the
 currently running thread is 'ipmi0: kcs' thread with tid 100118.
 It would interesting to examine what it is doing.
 
 -- 
 Andriy Gapon

From: Andriy Gapon <avg@FreeBSD.org>
To: bug-followup@FreeBSD.org, eugen@eg.sd.rdtc.ru
Cc:  
Subject: Re: kern/172166: Deadlock in the networking code, possible due to
 a bug in the SCHED_ULE
Date: Sun, 30 Sep 2012 16:42:53 +0300

 on 30/09/2012 14:54 Andriy Gapon said the following:
 > 
 > It looks like CPUs 0 - 4 are idle, but CPU 5 has load of three.
 > One of those threads is the syslogd thread that holds the lock, but the
 > currently running thread is 'ipmi0: kcs' thread with tid 100118.
 > It would interesting to examine what it is doing.
 > 
 
 Looks like the kcs busy loops in here: kcs_loop -> kcs_read_byte ->
 kcs_wait_for_obf.
 Since this is a 6-CPU machine, steal threshold is set to 3 so other CPUs don't
 try to take any work from CPU5. Not sure if this is smart actually.  Maybe it
 would make sense to have a lower threshold or to allow stealing of real-time
 threads at a lower threshold.
 
 Since the kcs thread is a kernel thread with real-time priority (68) it doesn't
 allow any other lower priority thread to run while it's not sleeping.
 
 Also, it looks like rwlock does not take care to propagate waiters' priorities
 in all cases.  Maybe priority propagation could have helped here, but not sure...
 
 -- 
 Andriy Gapon

From: Andriy Gapon <avg@FreeBSD.org>
To: bug-followup@FreeBSD.org, eugen@eg.sd.rdtc.ru
Cc:  
Subject: Re: kern/172166: Deadlock in the networking code, possible due to
 a bug in the SCHED_ULE
Date: Sun, 30 Sep 2012 16:44:09 +0300

 on 30/09/2012 16:42 Andriy Gapon said the following:
 > on 30/09/2012 14:54 Andriy Gapon said the following:
 >>
 >> It looks like CPUs 0 - 4 are idle, but CPU 5 has load of three.
 >> One of those threads is the syslogd thread that holds the lock, but the
 >> currently running thread is 'ipmi0: kcs' thread with tid 100118.
 >> It would interesting to examine what it is doing.
 >>
 > 
 > Looks like the kcs busy loops in here: kcs_loop -> kcs_read_byte ->
 > kcs_wait_for_obf.
 > Since this is a 6-CPU machine, steal threshold is set to 3 so other CPUs don't
 > try to take any work from CPU5. Not sure if this is smart actually.  Maybe it
 > would make sense to have a lower threshold or to allow stealing of real-time
 > threads at a lower threshold.
 > 
 > Since the kcs thread is a kernel thread with real-time priority (68) it doesn't
 > allow any other lower priority thread to run while it's not sleeping.
 > 
 > Also, it looks like rwlock does not take care to propagate waiters' priorities
 > in all cases.  Maybe priority propagation could have helped here, but not sure...
 > 
 
 In any case, the original trigger for this problem seems to be something in IPMI
 that keeps that thread running.
 
 -- 
 Andriy Gapon

From: Alexander Motin <mav@FreeBSD.org>
To: bug-followup@FreeBSD.org, eugen@eg.sd.rdtc.ru
Cc: Andriy Gapon <avg@FreeBSD.org>
Subject: Re: kern/172166: Deadlock in the networking code, possible due to
 a bug in the SCHED_ULE
Date: Tue, 02 Oct 2012 09:58:00 +0300

 About rw_lock priority propagation locking(9) tells:
 The rw_lock locks have priority propagation like mutexes, but priority 
 can be propagated only to an exclusive holder.  This limitation comes 
 from the fact that shared owners are anonymous.
 
 What's about idle stealing threshold, it was fixed in HEAD at r239194, 
 but wasn't merged yet. It should be trivial to merge it.
 
 -- 
 Alexander Motin

From: Eugene Grosbein <egrosbein@rdtc.ru>
To: Alexander Motin <mav@FreeBSD.org>
Cc: bug-followup@FreeBSD.org, eugen@eg.sd.rdtc.ru,
        Andriy Gapon <avg@FreeBSD.org>
Subject: Re: kern/172166: Deadlock in the networking code, possible due to
 a bug in the SCHED_ULE
Date: Tue, 02 Oct 2012 14:48:04 +0700

 02.10.2012 13:58, Alexander Motin :
 > About rw_lock priority propagation locking(9) tells:
 > The rw_lock locks have priority propagation like mutexes, but priority 
 > can be propagated only to an exclusive holder.  This limitation comes 
 > from the fact that shared owners are anonymous.
 > 
 > What's about idle stealing threshold, it was fixed in HEAD at r239194, 
 > but wasn't merged yet. It should be trivial to merge it.
 
 Would it fix my problem with 6-CPU box?
 Your commit log talks about "8 or more cores".
 
 Eugene Grosbein

From: Alexander Motin <mav@FreeBSD.org>
To: Eugene Grosbein <egrosbein@rdtc.ru>
Cc: bug-followup@FreeBSD.org, eugen@eg.sd.rdtc.ru, 
 Andriy Gapon <avg@FreeBSD.org>
Subject: Re: kern/172166: Deadlock in the networking code, possible due to
 a bug in the SCHED_ULE
Date: Tue, 02 Oct 2012 10:53:49 +0300

 On 02.10.2012 10:48, Eugene Grosbein wrote:
 > 02.10.2012 13:58, Alexander Motin :
 >> About rw_lock priority propagation locking(9) tells:
 >> The rw_lock locks have priority propagation like mutexes, but priority
 >> can be propagated only to an exclusive holder.  This limitation comes
 >> from the fact that shared owners are anonymous.
 >>
 >> What's about idle stealing threshold, it was fixed in HEAD at r239194,
 >> but wasn't merged yet. It should be trivial to merge it.
 >
 > Would it fix my problem with 6-CPU box?
 > Your commit log talks about "8 or more cores".
 
 Hmm. Then I see no reason why threads were not stolen, unless they are 
 bound to specific CPU. Check `sysctl kern.sched.steal_thresh` output to 
 be sure.
 
 -- 
 Alexander Motin
 
 

From: Eugene Grosbein <egrosbein@rdtc.ru>
To: Alexander Motin <mav@FreeBSD.org>
Cc: bug-followup@FreeBSD.org, eugen@eg.sd.rdtc.ru,
        Andriy Gapon <avg@FreeBSD.org>
Subject: Re: kern/172166: Deadlock in the networking code, possible due to
 a bug in the SCHED_ULE
Date: Tue, 02 Oct 2012 14:59:31 +0700

 02.10.2012 14:53, Alexander Motin :
 > On 02.10.2012 10:48, Eugene Grosbein wrote:
 >> 02.10.2012 13:58, Alexander Motin :
 >>> About rw_lock priority propagation locking(9) tells:
 >>> The rw_lock locks have priority propagation like mutexes, but priority
 >>> can be propagated only to an exclusive holder.  This limitation comes
 >>> from the fact that shared owners are anonymous.
 >>>
 >>> What's about idle stealing threshold, it was fixed in HEAD at r239194,
 >>> but wasn't merged yet. It should be trivial to merge it.
 >>
 >> Would it fix my problem with 6-CPU box?
 >> Your commit log talks about "8 or more cores".
 > 
 > Hmm. Then I see no reason why threads were not stolen, unless they are 
 > bound to specific CPU. Check `sysctl kern.sched.steal_thresh` output to 
 > be sure.
 
 All NIC's threads and dummynet are bound in my boxes.
 igb(4) in RELENG_8 bounds its threads by default in very wrong way,
 so I rebound them. dummynet(8) in RELENG_8 goes wild under severe load
 unless bound to single or two cores.
 
 kern.sched.steal_thresh: 2

From: Alexander Motin <mav@FreeBSD.org>
To: Eugene Grosbein <egrosbein@rdtc.ru>
Cc: bug-followup@FreeBSD.org, eugen@eg.sd.rdtc.ru, 
 Andriy Gapon <avg@FreeBSD.org>
Subject: Re: kern/172166: Deadlock in the networking code, possible due to
 a bug in the SCHED_ULE
Date: Tue, 02 Oct 2012 11:45:23 +0300

 On 02.10.2012 10:59, Eugene Grosbein wrote:
 > 02.10.2012 14:53, Alexander Motin :
 >> On 02.10.2012 10:48, Eugene Grosbein wrote:
 >>> 02.10.2012 13:58, Alexander Motin :
 >>>> About rw_lock priority propagation locking(9) tells:
 >>>> The rw_lock locks have priority propagation like mutexes, but priority
 >>>> can be propagated only to an exclusive holder.  This limitation comes
 >>>> from the fact that shared owners are anonymous.
 >>>>
 >>>> What's about idle stealing threshold, it was fixed in HEAD at r239194,
 >>>> but wasn't merged yet. It should be trivial to merge it.
 >>>
 >>> Would it fix my problem with 6-CPU box?
 >>> Your commit log talks about "8 or more cores".
 >>
 >> Hmm. Then I see no reason why threads were not stolen, unless they are
 >> bound to specific CPU. Check `sysctl kern.sched.steal_thresh` output to
 >> be sure.
 >
 > All NIC's threads and dummynet are bound in my boxes.
 > igb(4) in RELENG_8 bounds its threads by default in very wrong way,
 > so I rebound them. dummynet(8) in RELENG_8 goes wild under severe load
 > unless bound to single or two cores.
 
 That can be an answer. Active thread can never never stolen and if it 
 has high absolute priority and never sleeps voluntary -- it will run 
 there forever. If all other threads are bound to that CPU, they also can 
 not be stolen and will wait forever.
 
 > kern.sched.steal_thresh: 2
 
 This should not prevent stealing.
 
 PS: I've just noticed that for some reason I haven't merged my scheduler 
 improvements to 8-STABLE branch. So behavior may differ from one in HEAD 
 or 9-STABLE. I will recheck commits history to recall what stopped me 
 from merge. But I don't remember all details to predict whether it may 
 affect your problem somehow.
 
 -- 
 Alexander Motin

From: Andriy Gapon <avg@FreeBSD.org>
To: bug-followup@FreeBSD.org
Cc: Alexander Motin <mav@FreeBSD.org>, eugen@eg.sd.rdtc.ru
Subject: Re: kern/172166: Deadlock in the networking code, possible due to
 a bug in the SCHED_ULE
Date: Wed, 03 Oct 2012 17:56:39 +0300

 on 02/10/2012 09:58 Alexander Motin said the following:
 > About rw_lock priority propagation locking(9) tells:
 > The rw_lock locks have priority propagation like mutexes, but priority can be
 > propagated only to an exclusive holder.  This limitation comes from the fact that
 > shared owners are anonymous.
 
 Yeah... and as we see it has a potential to result in priority inversion.
 
 > What's about idle stealing threshold, it was fixed in HEAD at r239194, but wasn't
 > merged yet. It should be trivial to merge it.
 
 And I've also misread the code, confused 6 CPUs case with 8 CPUs case.
 
 
 -- 
 Andriy Gapon

From: Eugene Grosbein <egrosbein@rdtc.ru>
To: Andriy Gapon <avg@FreeBSD.org>
Cc: bug-followup@FreeBSD.org, Alexander Motin <mav@FreeBSD.org>
Subject: Re: kern/172166: Deadlock in the networking code, possible due to
 a bug in the SCHED_ULE
Date: Thu, 04 Oct 2012 13:12:22 +0700

 03.10.2012 21:56, Andriy Gapon :
 > on 02/10/2012 09:58 Alexander Motin said the following:
 >> About rw_lock priority propagation locking(9) tells:
 >> The rw_lock locks have priority propagation like mutexes, but priority can be
 >> propagated only to an exclusive holder.  This limitation comes from the fact that
 >> shared owners are anonymous.
 > 
 > Yeah... and as we see it has a potential to result in priority inversion.
 > 
 >> What's about idle stealing threshold, it was fixed in HEAD at r239194, but wasn't
 >> merged yet. It should be trivial to merge it.
 > 
 > And I've also misread the code, confused 6 CPUs case with 8 CPUs case.
 > 
 > 
 
 Can I have any advice/workaround/bugfix on how to reconfigure my routers
 to prevent them from locking this way?

From: Eugene Grosbein <egrosbein@rdtc.ru>
To: Andriy Gapon <avg@FreeBSD.org>
Cc: bug-followup@FreeBSD.org, Alexander Motin <mav@FreeBSD.org>
Subject: Re: kern/172166: Deadlock in the networking code, possible due to
 a bug in the SCHED_ULE
Date: Thu, 04 Oct 2012 13:12:22 +0700

 03.10.2012 21:56, Andriy Gapon :
 > on 02/10/2012 09:58 Alexander Motin said the following:
 >> About rw_lock priority propagation locking(9) tells:
 >> The rw_lock locks have priority propagation like mutexes, but priority can be
 >> propagated only to an exclusive holder.  This limitation comes from the fact that
 >> shared owners are anonymous.
 > 
 > Yeah... and as we see it has a potential to result in priority inversion.
 > 
 >> What's about idle stealing threshold, it was fixed in HEAD at r239194, but wasn't
 >> merged yet. It should be trivial to merge it.
 > 
 > And I've also misread the code, confused 6 CPUs case with 8 CPUs case.
 > 
 > 
 
 Can I have any advice/workaround/bugfix on how to reconfigure my routers
 to prevent them from locking this way?

From: Andriy Gapon <avg@FreeBSD.org>
To: Eugene Grosbein <egrosbein@rdtc.ru>
Cc: bug-followup@FreeBSD.org, Alexander Motin <mav@FreeBSD.org>
Subject: Re: kern/172166: Deadlock in the networking code, possible due to
 a bug in the SCHED_ULE
Date: Thu, 04 Oct 2012 13:23:55 +0300

 on 04/10/2012 09:12 Eugene Grosbein said the following:
 > 03.10.2012 21:56, Andriy Gapon :
 >> on 02/10/2012 09:58 Alexander Motin said the following:
 >>> About rw_lock priority propagation locking(9) tells:
 >>> The rw_lock locks have priority propagation like mutexes, but priority can be
 >>> propagated only to an exclusive holder.  This limitation comes from the fact that
 >>> shared owners are anonymous.
 >>
 >> Yeah... and as we see it has a potential to result in priority inversion.
 >>
 >>> What's about idle stealing threshold, it was fixed in HEAD at r239194, but wasn't
 >>> merged yet. It should be trivial to merge it.
 >>
 >> And I've also misread the code, confused 6 CPUs case with 8 CPUs case.
 >>
 
 BTW, I've just noticed that the syslogd thread had td_pinned == 1 and I can't
 explain why...  But that probably explains why it was not stolen.
 
 > 
 > Can I have any advice/workaround/bugfix on how to reconfigure my routers
 > to prevent them from locking this way?
 
 As I said, the primary problem here is the ipmi thread going insane.
 You can try to remove ipmi driver, if you can afford that.
 Or you can try to hack on it, so that
 (1) it voluntary yields even when it thinks that it always has work to do
 (2) there is some diagnostic on what keeps it running
 
 You may also try to set the thread's priority to PUSER (using sched_prio), but I
 am not sure what bad side-effects may happen because of that.
 
 No magic bullet here, sorry.
 
 -- 
 Andriy Gapon

From: Eugene Grosbein <egrosbein@rdtc.ru>
To: Andriy Gapon <avg@freebsd.org>
Cc: bug-followup@freebsd.org, Alexander Motin <mav@freebsd.org>
Subject: Re: kern/172166: Deadlock in the networking code, possible due to
 a bug in the SCHED_ULE
Date: Fri, 05 Oct 2012 15:39:58 +0700

 04.10.2012 17:23, Andriy Gapon :
 
 >> Can I have any advice/workaround/bugfix on how to reconfigure my routers
 >> to prevent them from locking this way?
 > 
 > As I said, the primary problem here is the ipmi thread going insane.
 > You can try to remove ipmi driver, if you can afford that.
 > Or you can try to hack on it, so that
 > (1) it voluntary yields even when it thinks that it always has work to do
 > (2) there is some diagnostic on what keeps it running
 > 
 > You may also try to set the thread's priority to PUSER (using sched_prio), but I
 > am not sure what bad side-effects may happen because of that.
 > 
 > No magic bullet here, sorry.
 
 Thank you. As workaround, I've unloaded ipmi.ko
 and edited my scripts to access IPMI sensors over IP instead of local interface.
 
 Eugene Grosbein

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/172166: commit references a PR
Date: Mon, 25 Mar 2013 14:30:42 +0000 (UTC)

 Author: melifaro
 Date: Mon Mar 25 14:30:34 2013
 New Revision: 248705
 URL: http://svnweb.freebsd.org/changeset/base/248705
 
 Log:
   Unlock IPMI sc while performing requests via KCS and SMIC interfaces.
   It is already done in SSIF interface code.
   This reduces contention/spinning reported by many users.
   
   PR:		kern/172166
   Submitted by:	Eric van Gyzen <eric at vangyzen.net>
   MFC after:	2 weeks
 
 Modified:
   head/sys/dev/ipmi/ipmi_kcs.c
   head/sys/dev/ipmi/ipmi_smic.c
 
 Modified: head/sys/dev/ipmi/ipmi_kcs.c
 ==============================================================================
 --- head/sys/dev/ipmi/ipmi_kcs.c	Mon Mar 25 13:58:17 2013	(r248704)
 +++ head/sys/dev/ipmi/ipmi_kcs.c	Mon Mar 25 14:30:34 2013	(r248705)
 @@ -456,6 +456,7 @@ kcs_loop(void *arg)
  
  	IPMI_LOCK(sc);
  	while ((req = ipmi_dequeue_request(sc)) != NULL) {
 +		IPMI_UNLOCK(sc);
  		ok = 0;
  		for (i = 0; i < 3 && !ok; i++)
  			ok = kcs_polled_request(sc, req);
 @@ -463,6 +464,7 @@ kcs_loop(void *arg)
  			req->ir_error = 0;
  		else
  			req->ir_error = EIO;
 +		IPMI_LOCK(sc);
  		ipmi_complete_request(sc, req);
  	}
  	IPMI_UNLOCK(sc);
 
 Modified: head/sys/dev/ipmi/ipmi_smic.c
 ==============================================================================
 --- head/sys/dev/ipmi/ipmi_smic.c	Mon Mar 25 13:58:17 2013	(r248704)
 +++ head/sys/dev/ipmi/ipmi_smic.c	Mon Mar 25 14:30:34 2013	(r248705)
 @@ -362,6 +362,7 @@ smic_loop(void *arg)
  
  	IPMI_LOCK(sc);
  	while ((req = ipmi_dequeue_request(sc)) != NULL) {
 +		IPMI_UNLOCK(sc);
  		ok = 0;
  		for (i = 0; i < 3 && !ok; i++)
  			ok = smic_polled_request(sc, req);
 @@ -369,6 +370,7 @@ smic_loop(void *arg)
  			req->ir_error = 0;
  		else
  			req->ir_error = EIO;
 +		IPMI_LOCK(sc);
  		ipmi_complete_request(sc, req);
  	}
  	IPMI_UNLOCK(sc);
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
>Unformatted:
