From nobody@FreeBSD.org  Sun Jan 26 02:12:52 2014
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
	(using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by hub.freebsd.org (Postfix) with ESMTPS id 5C718AF0
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 26 Jan 2014 02:12:52 +0000 (UTC)
Received: from oldred.freebsd.org (oldred.freebsd.org [IPv6:2001:1900:2254:206a::50:4])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.freebsd.org (Postfix) with ESMTPS id 3C9711D17
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 26 Jan 2014 02:12:52 +0000 (UTC)
Received: from oldred.freebsd.org ([127.0.1.6])
	by oldred.freebsd.org (8.14.5/8.14.7) with ESMTP id s0Q2CpdG012685
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 26 Jan 2014 02:12:51 GMT
	(envelope-from nobody@oldred.freebsd.org)
Received: (from nobody@localhost)
	by oldred.freebsd.org (8.14.5/8.14.5/Submit) id s0Q2Cp60012677;
	Sun, 26 Jan 2014 02:12:51 GMT
	(envelope-from nobody)
Message-Id: <201401260212.s0Q2Cp60012677@oldred.freebsd.org>
Date: Sun, 26 Jan 2014 02:12:51 GMT
From: Yury <hawk256@yandex.ru>
To: freebsd-gnats-submit@FreeBSD.org
Subject: MPD5.7 umtxn
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         186114
>Category:       amd64
>Synopsis:       MPD5.7 umtxn
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-amd64
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          update
>Submitter-Id:   current-users
>Arrival-Date:   Sun Jan 26 02:20:00 UTC 2014
>Closed-Date:    
>Last-Modified:  Wed Feb  5 12:50:01 UTC 2014
>Originator:     Yury
>Release:        FreeBSD 10.0
>Organization:
GreenLine
>Environment:
FreeBSD gw01.comteks.biz 10.0-STABLE FreeBSD 10.0-STABLE #0 r261173: Sun Jan 26 03:58:09 MSK 2014     hawk@gw01.comteks.biz:/usr/obj/usr/src/sys/Hawk  amd64

>Description:
I have BRAS on FreeBSD. It was 9.2 STABLE. I tried to update it up to 10.0 RELEASE, later tried to STABLE. On both variants I have the same problem.

Some time after start, around 5 minutes, it works normally. But after 100-150 users have connected trough PPPoE (MPD5.7) MPD process stops in state umtxn.

Of course, no one can connect after that. But who have already connected keeping work.

last pid: 17712;  load averages:  1.16,  0.65,  0.27          up 0+00:01:51  05:28:23
50 processes:  1 running, 49 sleeping
CPU:  0.0% user,  0.0% nice,  1.0% system,  0.9% interrupt, 98.1% idle
Mem: 1162M Active, 56M Inact, 400M Wired, 145M Buf, 2274M Free
Swap: 4096M Total, 4096M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
 2535 root          1  20    0   201M   184M select  3   1:14  10.69% zebra
 2476 _pflogd       1  20    0 14600K  2200K bpf     0   0:12   0.00% pflogd
 2541 root          1  20    0   224M   206M select  2   0:07   0.00% bgpd
 9803 root          1  20    0 78624K 44092K select  2   0:02   0.00% bsnmpd
 3462 root          3  20    0 56736K  9164K umtxn   0   0:01   0.00% mpd5
 7243 mysql        17  32    0  6958M   636M uwait   1   0:01   0.00% mysqld
 6095 bind          7  20    0   129M 76864K kqread  1   0:01   0.00% named
 3872 root          1  20    0 61124K  6808K select  1   0:00   0.00% nmbd
 8644 root          3  20    0 47332K  6216K select  1   0:00   0.00% utm5_rfw


procstat -k 3462
  PID    TID COMM             TDNAME           KSTACK
 3462 100113 mpd5             -                mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_lock_umutex __umtx_op_wait_umutex amd64_syscall Xfast_syscall
 3462 100115 mpd5             -                mi_switch sleepq_catch_signals sleepq_wait_sig _cv_wait_sig seltdwait sys_poll amd64_syscall Xfast_syscall
 3462 100512 mpd5             -                mi_switch sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_lock_umutex __umtx_op_wait_umutex amd64_syscall Xfast_syscall



/var/log/mpd.log
Jan 26 05:28:13 gw01 mpd: [B_ppp-46] IPCP: Up event
Jan 26 05:28:13 gw01 mpd: [B_ppp-46] IPCP: state change Starting --> Req-Sent
Jan 26 05:28:13 gw01 mpd: [B_ppp-46] IPCP: SendConfigReq #1
Jan 26 05:28:13 gw01 mpd: [B_ppp-46]   IPADDR 10.10.0.1
Jan 26 05:28:13 gw01 mpd: [B_ppp-46] IPV6CP: Up event
Jan 26 05:28:13 gw01 mpd: [B_ppp-46] IPV6CP: state change Starting --> Req-Sent
Jan 26 05:28:13 gw01 mpd: [B_ppp-46] IPV6CP: SendConfigReq #1
Jan 26 05:28:13 gw01 mpd: [vlan6-107] LCP: rec'd Terminate Request #240 (Opened)
Jan 26 05:28:13 gw01 mpd: [vlan6-107] LCP: state change Opened --> Stopping
Jan 26 05:28:13 gw01 mpd: [vlan6-107] Link: Leave bundle "B_ppp-46"

It always stops with the same 3 last strings.



Jan 26 05:52:38 gw01 kernel: sonewconn: pcb 0xfffff80007757c40: Listen queue overflow:
 4 already in queue awaiting acceptance
Jan 26 05:53:09 gw01 last message repeated 60 times
Jan 26 05:53:34 gw01 last message repeated 51 times


Kernel conf:
GENERIC + 
device          ipmi
device          coretemp
device          smbus

device          lagg
device          netmap

options         IPI_PREEMPTION

options         IPFIREWALL
options         IPFIREWALL_VERBOSE
options         IPDIVERT
options         DUMMYNET
options         IPFIREWALL_NAT
options         LIBALIAS

device          pf
device          pflog
device          pfsync

options         ALTQ
options         ALTQ_CBQ        # Class Bases Queuing (CBQ)
options         ALTQ_RED        # Random Early Detection (RED)
options         ALTQ_RIO        # RED In/Out
options         ALTQ_HFSC       # Hierarchical Packet Scheduler (HFSC)
options         ALTQ_PRIQ       # Priority Queuing (PRIQ)
options         ALTQ_NOPCC      # Required for SMP build

options         NETGRAPH
options         NETGRAPH_BPF
options         NETGRAPH_CAR
options         NETGRAPH_ETHER
options         NETGRAPH_IPFW
options         NETGRAPH_IFACE
options         NETGRAPH_KSOCKET
options         NETGRAPH_PPP
options         NETGRAPH_PPTPGRE
options         NETGRAPH_PPPOE
options         NETGRAPH_SOCKET
options         NETGRAPH_TCPMSS
options         NETGRAPH_TEE
options         NETGRAPH_VJC
options         NETGRAPH_MPPC_ENCRYPTION
options         NETGRAPH_NETFLOW



CPU: Intel(R) Xeon(R) CPU           X3470  @ 2.93GHz (2933.36-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x106e5  Family = 0x6  Model = 0x1e  Stepping = 5
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x98e3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant, performance statistics
real memory  = 4294967296 (4096 MB)
avail memory = 4052344832 (3864 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <INTEL  S3420GPC>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  2
 cpu2 (AP): APIC ID:  4
 cpu3 (AP): APIC ID:  6


I tried to get ktrace dump. But I could not open it.
ktrdump: kvm_nlist: No such file or directory


I think, It is something wrong with netgraph system.
>How-To-Repeat:
Update to FreeBSD 10.0 and try to connect 100-150 users.
>Fix:


>Release-Note:
>Audit-Trail:

From: Hawk256@yandex.ru
To: bug-followup@FreeBSD.org, hawk256@yandex.ru
Cc:  
Subject: Re: amd64/186114: MPD5.7 umtxn
Date: Tue, 4 Feb 2014 11:33:37 +1100

 In additional:
 
 -  this  problem  only with pppoe part. With pptp I and other peope on
 forum have not any problem.
 
 - here is gdb report from frozen mpd5:
 
 gdb mpd5 3645
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type "show copying" to see the conditions.
 There is absolutely no warranty for GDB.  Type "show warranty" for details.
 This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)...
 Attaching to program: /usr/local/sbin/mpd5, process 3645
 Reading symbols from /usr/lib/libwrap.so.6...(no debugging symbols found)...done.
 Loaded symbols for /usr/lib/libwrap.so.6
 Reading symbols from /usr/lib/libpam.so.5...(no debugging symbols found)...done.
 Loaded symbols for /usr/lib/libpam.so.5
 Reading symbols from /lib/libcrypt.so.5...(no debugging symbols found)...done.
 Loaded symbols for /lib/libcrypt.so.5
 Reading symbols from /usr/lib/libnetgraph.so.4...(no debugging symbols found)...done.
 Loaded symbols for /usr/lib/libnetgraph.so.4
 Reading symbols from /lib/libutil.so.9...(no debugging symbols found)...done.
 Loaded symbols for /lib/libutil.so.9
 Reading symbols from /usr/lib/libradius.so.4...(no debugging symbols found)...done.
 Loaded symbols for /usr/lib/libradius.so.4
 Reading symbols from /usr/lib/libssl.so.7...(no debugging symbols found)...done.
 Loaded symbols for /usr/lib/libssl.so.7
 Reading symbols from /lib/libpcap.so.8...(no debugging symbols found)...done.
 Loaded symbols for /lib/libpcap.so.8
 Reading symbols from /usr/lib/libfetch.so.6...(no debugging symbols found)...done.
 Loaded symbols for /usr/lib/libfetch.so.6
 Reading symbols from /lib/libcrypto.so.7...(no debugging symbols found)...done.
 Loaded symbols for /lib/libcrypto.so.7
 Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
 [New Thread 80340ec00 (LWP 100390/mpd5)]
 [New Thread 803020400 (LWP 100120/mpd5)]
 [New Thread 802c06800 (LWP 100119/mpd5)]
 Loaded symbols for /lib/libthr.so.3
 Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
 Loaded symbols for /lib/libc.so.7
 Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
 Loaded symbols for /libexec/ld-elf.so.1
 [Switching to Thread 80340ec00 (LWP 100390/mpd5)]
 0x0000000801fc089a in __error () from /lib/libthr.so.3
 (gdb) bt
 #0  0x0000000801fc089a in __error () from /lib/libthr.so.3
 #1  0x0000000801fbb79d in pthread_mutex_destroy () from /lib/libthr.so.3
 #2  0x00000008022dea9a in vsyslog () from /lib/libc.so.7
 #3  0x00000000004487c7 in ?? ()
 #4  0x00000000004486e2 in ?? ()
 #5  0x000000000045123c in ?? ()
 #6  0x0000000000426534 in ?? ()
 #7  0x000000000045d1b6 in ?? ()
 #8  0x0000000801fb54a4 in pthread_create () from /lib/libthr.so.3
 #9  0x00007fffff5fc000 in ?? ()
 Error accessing memory address 0x7fffff7fc000: Bad address.
 (gdb) where
 #0  0x0000000801fc089a in __error () from /lib/libthr.so.3
 #1  0x0000000801fbb79d in pthread_mutex_destroy () from /lib/libthr.so.3
 #2  0x00000008022dea9a in vsyslog () from /lib/libc.so.7
 #3  0x00000000004487c7 in ?? ()
 #4  0x00000000004486e2 in ?? ()
 #5  0x000000000045123c in ?? ()
 #6  0x0000000000426534 in ?? ()
 #7  0x000000000045d1b6 in ?? ()
 #8  0x0000000801fb54a4 in pthread_create () from /lib/libthr.so.3
 #9  0x00007fffff5fc000 in ?? ()
 Error accessing memory address 0x7fffff7fc000: Bad address.
 (gdb) quit
 

From: Hawk256@yandex.ru
To: bug-followup@FreeBSD.org, hawk256@yandex.ru
Cc:  
Subject: Re: amd64/186114: MPD5.7 umtxn
Date: Wed, 5 Feb 2014 23:40:50 +1100

 One more interesting thing.
 
 Problem appears usually when MPD5 starts with booting of system.
 If I comment mpd_enable in rc.conf and start MPD5 manually after 10-20
 minutes - it works normally.
 
>Unformatted:
