From nobody@FreeBSD.org  Sun Apr 26 00:39:18 2009
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BDFA9106566C
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 26 Apr 2009 00:39:18 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id A075D8FC0C
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 26 Apr 2009 00:39:18 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id n3Q0dIEo012187
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 26 Apr 2009 00:39:18 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id n3Q0dIpd012186;
	Sun, 26 Apr 2009 00:39:18 GMT
	(envelope-from nobody)
Message-Id: <200904260039.n3Q0dIpd012186@www.freebsd.org>
Date: Sun, 26 Apr 2009 00:39:18 GMT
From: Randy Bush <randy@psg.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: swap_pager_getswapspace(4): failed
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         134011
>Category:       kern
>Synopsis:       [hang] swap_pager_getswapspace(4): failed
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Apr 26 00:40:01 UTC 2009
>Closed-Date:    
>Last-Modified:  Sun May 31 01:50:00 UTC 2009
>Originator:     Randy Bush
>Release:        8-current Apr 25 12:52
>Organization:
Internet Initiative Japan
>Environment:
FreeBSD work0.psg.com 8.0-CURRENT FreeBSD 8.0-CURRENT #1: Sat Apr 25 13:14:13 UTC 2009     root@work0.psg.com:/usr/obj/usr/src/sys/WORK0  amd64

>Description:
midnight (gmt) maint runs on a number of 8-current systems.  has been going on for weeks.

swap_pager_getswapspace(16): failed
swap_pager_getswapspace(16): failed
swap_pager_getswapspace(16): failed
swap_pager_getswapspace(2): failed
swap_pager_getswapspace(2): failed
swap_pager_getswapspace(2): failed
swap_pager_getswapspace(2): failed

many of these and then system either recovers or is dead to serial console.  command prompt remains on ssh shells, but type a command and the shell is locked.  system is pingable but non-responsive.

the below are from when running normally in the case where it has recovered.

work0.psg.com:/root# swapinfo
Device          1024-blocks     Used    Avail Capacity
/dev/mirror/bootb     8382708   307416  8075292     4%

work0.psg.com:/root# df -m
Filesystem                                   1M-blocks   Used  Avail Capacity  Mounted on
/dev/mirror/boota                                 7931    603   6693     8%    /
devfs                                                0      0      0   100%    /dev
procfs                                               0      0      0   100%    /proc
tank/data                                       663002      0 663002     0%    /data
tank/data/nfsen                                 828448 165446 663002    20%    /data/nfsen
tank/data/rpki                                  663179    177 663002     0%    /data/rpki
tank                                            663002      0 663002     0%    /tank
tank/usr                                        668072   5069 663002     1%    /usr
tank/usr/home                                   669874   6871 663002     1%    /usr/home
tank/usr/usr                                    668320   5317 663002     1%    /usr/usr
tank/var                                        663109    106 663002     0%    /var
tank/var/log                                    663105    103 663002     0%    /var/log
tank/var/spool                                  663171    168 663002     0%    /var/spool
tank/var/spool/var                              663002      0 663002     0%    /var/spool/var
tank/var/spool/var/spool                        663016     13 663002     0%    /var/spool/var/spool
/dev/md0                                           247      0    227     0%    /tmp
devfs                                                0      0      0   100%    /data/rpki/rcynic/dev
tank/usr@backup                                 668580   5578 663002     1%    /usr/.zfs/snapshot/backup
tank/usr/usr@2009-03-21-00-49-06                668341   5338 663002     1%    /usr/usr/.zfs/snapshot/2009-03-21-00-49-06
tank/var/spool/var/spool@2009-03-21-02-06-08    663016     13 663002     0%    /var/spool/var/spool/.zfs/snapshot/2009-03-21-02-06-08

work0.psg.com:/root# cat /etc/fstab
# Device        Mountpoint              FStype  Options         Dump    Pass#
#
/dev/mirror/boota       /                       ufs     rw              1       1
#
/dev/mirror/bootb       none                    swap    sw              0       0
#
/dev/acd0       /cdrom                  cd9660  ro,noauto       0       0
#
proc            /proc                   procfs  rw              0       0
#linprocfs      /compat/linux/proc      linprocfs rw            0       0
#
# end

work0.psg.com:/root# top -bores
last pid: 29419;  load averages:  1.03,  0.46,  0.17  up 6+06:12:06    12:46:21
147 processes: 3 running, 144 sleeping

Mem: 202M Active, 39M Inact, 1173M Wired, 2756K Cache, 39M Buf, 2540M Free
Swap: 8186M Total, 300M Used, 7886M Free, 3% Inuse


  PID USERNAME   THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
 1034 ejabberd     2  44    0   124M 51244K ucond   1   7:52  0.00% beam
  803 unbound      1  44    0 36420K 24988K select  0   6:05  0.00% unbound
 1370 root         1  76    0   225M 22056K nanslp  1 135:43  0.00% perl5.8.9
 1229 mysql        9  44    0 66984K 21428K sigwai  1   4:05  0.00% mysqld
29159 www          1  47    0    98M 12824K lockf   0   0:01  0.00% httpd
28447 www          1  44    0 99328K 11968K lockf   0   0:01  0.00% httpd
29081 www          1  44    0 99328K 11836K lockf   0   0:01  0.00% httpd
29217 www          1  45    0 99328K 11600K lockf   1   0:00  1.27% httpd
27072 www          1  44    0 98304K 11064K lockf   0   0:01  0.00% httpd
29018 www          1  44    0 98304K 10912K kqread  1   0:00  0.00% httpd
29141 www          1  44    0 98304K 10840K lockf   0   0:00  0.00% httpd
37090 root        19  44    0   170M  8492K select  0  11:33  0.00% asterisk
28098 root         1  48    0 10528K  6592K select  0   0:17  7.08% cvsup
29222 www          1  44    0 96256K  5700K lockf   0   0:00  0.00% httpd
29306 www          1  44    0 96256K  5624K lockf   0   0:00  0.00% httpd
29337 www          1  44    0 96256K  4536K lockf   0   0:00  0.00% httpd
28072 root         1  44    0 17764K  3316K select  1   0:00  0.00% sshd
 1302 root         1  44    0 96256K  2616K select  0   0:15  0.00% httpd


work0.psg.com:/root# vmstat -h 
 procs      memory      page                    disks     faults         cpu
 r b w     avm    fre   flt  re  pi  po    fr  sr ad4 ad6   in   sy   cs us sy id
 0 0 10   4503M  2628M   219   0   0   2   470 423   0   0  205 2341 4388  2  1 97


>How-To-Repeat:
wait for midnight

work0.psg.com:/usr/src# cat /etc/daily.local
#!/bin/sh
#

#
# reset ipfw filter counts
#
ipfw -f resetlog
ipfw -f zero

#
# mail and ntp
#
( /usr/bin/ntpq -c peers ; echo ; echo ; \
  echo irrd mirror requestors ; \
      /usr/bin/gunzip -c /var/log/irrd/irrd.log.0.gz \
      | for i in `grep 'rror request' | awk '{print $10}' | sort | uniq`; do \
          host $i | awk '{print "   " $5}'; done \
      ; echo ; echo ; \
  /usr/local/sbin/eximstats /var/spool/exim/log/main) \
  | Mail -s "`hostname` ntp/mail log report" postmaster
/usr/local/sbin/exicyclog
#/usr/bin/gzip -9 main.01

#
# system log
#
/usr/bin/gunzip -c /var/log/messages.0.gz | \
    /usr/bin/egrep -iv '(last message repeated|logfile turned over|PAM: authentication error|reverse map|sshd.*(Did not receive identification string|Disabling protocol|does not map back|(ftp|uucp) not allowed|Invalid user|Failed password|accepted|connection closed|received disconnect|SSH: Server)|sshguard)' | \
  /usr/bin/Mail -s "`hostname` System Log" root

/usr/bin/gunzip -c /var/log/messages.0.gz | \
  /usr/bin/egrep 'sshd.*((ftp|uucp) not allowed|Invalid user|Failed password)|sshguard' | \
  /usr/bin/Mail -s "`hostname` attack Log" root

#

>Fix:


>Release-Note:
>Audit-Trail:

From: Randy Bush <randy@psg.com>
To: bug-followup@FreeBSD.org,randy@psg.com
Cc:  
Subject: Re: amd64/134011: swap_pager_getswapspace(4): failed
Date: Mon, 27 Apr 2009 17:31:41 +0900

 bug is not restricted to amd64.  here it is on an i386 system
 
 rip1.psg.com:/root# uname -a
 FreeBSD rip1.psg.com 8.0-CURRENT FreeBSD 8.0-CURRENT #9: Sun Apr  5 09:32:56 GMT 2009     root@rip1.psg.com:/usr/obj/usr/src/sys/RIP1  i386
 
 Apr 25 00:07:46 rip1 kernel: swap_pager_getswapspace(16): failed
 Apr 25 00:07:47 rip1 kernel: swap_pager_getswapspace(12): failed
 Apr 25 00:07:47 rip1 kernel: swap_pager_getswapspace(6): failed
 Apr 25 00:07:47 rip1 kernel: swap_pager_getswapspace(16): failed
 Apr 25 00:07:47 rip1 kernel: swap_pager_getswapspace(12): failed
 Apr 25 00:07:47 rip1 kernel: swap_pager_getswapspace(9): failed
 Apr 25 00:07:47 rip1 kernel: swap_pager_getswapspace(5): failed
 Apr 25 00:07:47 rip1 kernel: swap_pager_getswapspace(16): failed
 Apr 25 00:07:47 rip1 kernel: pid 40337 (exim_tidydb), uid 0, was killed: out of swap space
Responsible-Changed-From-To: freebsd-amd64->freebsd-bugs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Mon May 18 02:04:18 UTC 2009 
Responsible-Changed-Why:  
Not specific to amd64. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=134011 

From: Randy Bush <randy@rmac.local>
To: bug-followup@FreeBSD.org, randy@psg.com
Cc:  
Subject: Re: kern/134011: [hang] swap_pager_getswapspace(4): failed
Date: Fri, 29 May 2009 09:37:19 +0900

 this time it actually said something interesting on console!
 
 for some values of 'interesting' :)
 
 randy
 
 ---
 
 swap_pager_getswapspace(3): failed
 swap_pager_getswapspace(3): failed
 swap_pager_getswapspace(3): failed
 swap_pager_getswapspace(3): failed
 swap_pager_getswapsp
 a
 lcatale t(ra1p 612): : pafgae ifaullte wdhi
  e sin wkearnepl m_odpe
 acpugied =r 1_; agpiec itd s= w01a
 afasulpt vairtcuael (ad3dre)ss  :=  0x0f
 odfalulet dc
   e     s       =w saupeprv_isopr awgrieter _dagtae, ptagse nowta ppressepnat
 0inest(ru3ct)i:on  pfoinateir   l= e0dx2
  :0sxfwfaffpff_ffp80a47gc25e6
 rst_acgk eptoinsterw     a   p   s =p 0ax2c8:0exf(3ff)ff:f80 7f9fd1a680i
 poieadme 
    ntserw         a   p  _ =p 0xag28e:0rxf_fffgff8e07t9fsd16we0
  0xdep ssegpmenta               c= beas(e 30x0), :lim itf 0xaffifflf,e tydpe
    1bs
 w                       =a DpPL _0, ppraesg 1,e lorng_ 1g, deef3t2 s0,w garapn s1
 ledaocceseso(r 3ef)la:gs        =f ianteirrulpt eendab
    , rseswumae, pIOP_L p= a0
 gcuerrren_t pgroecests  s       = w789a (psysslpogad)
 ctreap( n9umb)er:               =  f12
 epainilc:e pdag
   fsauwlta
 pcp_uipd =a 1g
 eUptime: 9h50m49s
 Physical memory: 4083 MB
 Dumping 1958 MB:
 
 <end. required power cycle>

From: Randy Bush <randy@psg.com>
To: bug-followup@FreeBSD.org,randy@psg.com
Cc:  
Subject: Re: kern/134011: [hang] swap_pager_getswapspace(4): failed
Date: Sun, 31 May 2009 10:48:05 +0900

 a bit better in last night's syslog, possibly during backup
 
 May 30 00:40:14 work0 kernel: lock order reversal:
 May 30 00:40:14 work0 kernel: 1st 0xffffff0057d019d0 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:423
 May 30 00:40:14 work0 kernel: 2nd 0xffffff8052c01aa0 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2556
 May 30 00:40:14 work0 kernel: 3rd 0xffffff0004b8d098 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:544
 May 30 00:40:16 work0 kernel: lock order reversal:
 May 30 00:40:16 work0 kernel: 1st 0xffffff8052c01aa0 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2556
 May 30 00:40:16 work0 kernel: 2nd 0xffffff00d35c7d30 snaplk (snaplk) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:793
 May 30 00:50:14 work0 kernel: lock order reversal:
 May 30 00:50:14 work0 kernel: 1st 0xffffff00d35c7d30 snaplk (snaplk) @ /usr/src/sys/kern/vfs_vnops.c:297
 May 30 00:50:14 work0 kernel: 2nd 0xffffff0057d019d
 May 30 00:50:14 work0 kernel: 0 ufs (ufs) @ /u
 May 30 00:50:14 work0 kernel: s
 May 30 00:50:15 work0 kernel: r/src/sys/
 May 30 00:50:15 work0 kernel: ufs/ffs/
 May 30 00:50:15 work0 kernel: ffs_snap
 May 30 00:50:15 work0 kernel: shot.c:
 May 30 00:50:15 work0 kernel: 1587
 May 30 01:45:21 work0 kernel: 
 May 30 01:45:21 work0 kernel: 
 May 30 01:45:21 work0 kernel: Fatal trap 12: page fault while in kernel mode
 May 30 01:45:21 work0 kernel: cpuid = 0; apic id = 00
 May 30 01:45:21 work0 kernel: fault virtual address	= 0x0
 May 30 01:45:21 work0 kernel: fault code		= supervisor write data, page not present
 May 30 01:45:21 work0 kernel: instruction pointer	= 0x20:0xffffffff8047c256
 May 30 01:45:21 work0 kernel: sta
 May 30 01:45:21 work0 kernel: c
 May 30 01:45:21 work0 kernel: k pointer	        = 0x28:0xffffff807a057680
 May 30 01:45:21 work0 kernel: frame pointer	        = 0x28:0xffffff807a0576e0
 May 30 01:45:21 work0 kernel: code segment		= base 0x0, limit 0xfffff, type 0x1b
 May 30 01:45:21 work0 kernel: = DPL 0, pres 1, long 1, def32 0, gran 
 May 30 01:45:21 work0 kernel: 1
 May 30 01:45:21 work0 kernel: processor eflags	= interrup
 May 30 01:45:21 work0 kernel: t enabled, resume, 
 May 30 01:45:21 work0 kernel: IOPL = 0
 May 30 01:45:21 work0 kernel: current process
 May 30 01:45:21 work0 kernel: = 9181 (nfcapd)
 <required power cycle>
 May 30 02:10:04 work0 syslogd: kernel boot file is /boot/kernel/kernel
 
>Unformatted:
