From chris@borderware.com  Fri Jul  1 01:51:24 2005
Return-Path: <chris@borderware.com>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 9C2E316A41C
	for <FreeBSD-gnats-submit@freebsd.org>; Fri,  1 Jul 2005 01:51:24 +0000 (GMT)
	(envelope-from chris@borderware.com)
Received: from mail.borderware.com (mail.borderware.com [207.236.65.231])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0390043D49
	for <FreeBSD-gnats-submit@freebsd.org>; Fri,  1 Jul 2005 01:51:23 +0000 (GMT)
	(envelope-from chris@borderware.com)
Message-Id: <20050701015122.D587BA9B6@santana.borderware.com>
Date: Thu, 30 Jun 2005 21:51:22 -0400 (EDT)
From: Chris Gabe <chris@borderware.com>
Reply-To: Chris Gabe <chris@borderware.com>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: Kernel crash in 5.4 with SMP,PAE
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         82846
>Category:       kern
>Synopsis:       Kernel crash in 5.4 with SMP,PAE
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Jul 01 02:00:34 GMT 2005
>Closed-Date:    Wed Oct 12 15:09:23 UTC 2011
>Last-Modified:  Wed Oct 12 15:09:23 UTC 2011
>Originator:     Chris Gabe
>Release:        FreeBSD 5.4 i386
>Organization:
Borderware
>Environment:
System: FreeBSD santana.borderware.com 4.7-RELEASE-p20 FreeBSD 4.7-RELEASE-p20 #1: Fri Sep 26 13:30:29 EDT 2003 root@santana.borderware.com:/usr/obj/usr/src/sys/SANTANA i386


	
>Description:
Hello,

I've got a kernel crash on a Sun V40Z quad CPU, with FreeBSD 5.4 SMP, PAE (and kernel debugging), 8GB ram.  It happens every few hours.  System is not using a lot of memory at that time, but it's usually after accessing over 4GB of files, in separate chunks to a total of only a few MB of user memory.

I've hand transcribed the kernel trace below, and I haven't got the dmesg right now but a kernel log file from a 4.10 build we previously ran on the same hardware shows the basic idea.  An LSI RAID controller, mirrored/striped SCSI hard drives.

We're just wondering what direction to head with this.  Any advice?  Add more debugging, get a full crash dump, submit to something/someone, change kernel config option, sync to driver that has a fix for this (that would be a good one).

hand transcribed kernel trace:
kdb_enter
panic
lockmgr(ca71ce14,6,ca71cd68,0,f0147a1c) + 0x421
vop_stdunlock(<5 addresses>) + 1f
vop_defaultop(<4 addresses>,1000) + 13
spec_vnoperate(didn't transcribe any more) + 13
spec_write 64
spec_vnoperate 13
vnode_pager_generic_putpages 224
vop_stdputpages 1a
vop_defaultop 13
spec_vnoperate 13
vnode_pager_putpages 8a
vm_pageout_flush cb
vm_pageout_clean 2a1
vm_pageout_scan 706
vm_pageout 312
fork_exit 75
fork_trampoline 8
trap 0x1 eip=0, esp = 0xf0147d7c, ebp = 0


The kernel boot log file from 4.10 (sorry, I could get 5.4 dmesg but not until end of next week):
devices amr, mpt perhaps of extra relevance(?)

Jun 22 13:00:00 fifty newsyslog[11157]: logfile turned over due to size>1K
Jun 22 13:09:53 fifty /kernel: Copyright 1998-2004 BorderWare Technologies Inc.  All rights reserved.
Jun 22 13:09:53 fifty /kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Jun 22 13:09:53 fifty /kernel: The Regents of the University of California. All rights reserved.
Jun 22 13:09:53 fifty /kernel: S-CORE 8.00 #14: Mon Jun 13 09:27:22 EDT 2005
Jun 22 13:09:53 fifty /kernel: support@borderware.com:/sys/compile/S-CORE_SMP
Jun 22 13:09:53 fifty /kernel: Timecounter "i8254"  frequency 1193182 Hz
Jun 22 13:09:53 fifty /kernel: CPU: AMD Opteron(tm) Processor 850 (2391.27-MHz 686-class CPU)
Jun 22 13:09:53 fifty /kernel: Origin = "AuthenticAMD"  Id = 0xf5a  Stepping = 10
Jun 22 13:09:53 fifty /kernel: Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
Jun 22 13:09:53 fifty /kernel: AMD Features=0xe0500000<<b20>,AMIE,<b29>,DSP,3DNow!>
Jun 22 13:09:53 fifty /kernel: real memory  = 3824615424 (3734976K bytes)
Jun 22 13:09:53 fifty /kernel: avail memory = 3724136448 (3636852K bytes)
Jun 22 13:09:53 fifty /kernel: Programming 24 pins in IOAPIC #0
Jun 22 13:09:53 fifty /kernel: IOAPIC #0 intpin 2 -> irq 0
Jun 22 13:09:53 fifty /kernel: Programming 4 pins in IOAPIC #1
Jun 22 13:09:53 fifty /kernel: Programming 4 pins in IOAPIC #2
Jun 22 13:09:53 fifty /kernel: Programming 4 pins in IOAPIC #3
Jun 22 13:09:53 fifty /kernel: Programming 4 pins in IOAPIC #4
Jun 22 13:09:53 fifty /kernel: Programming 4 pins in IOAPIC #5
Jun 22 13:09:53 fifty /kernel: Programming 4 pins in IOAPIC #6
Jun 22 13:09:53 fifty /kernel: FreeBSD/SMP: Multiprocessor motherboard: 4 CPUs
Jun 22 13:09:53 fifty /kernel: cpu0 (BSP): apic id:  0, version: 0x00040010, at 0xfee00000
Jun 22 13:09:53 fifty /kernel: cpu1 (AP):  apic id:  1, version: 0x00040010, at 0xfee00000
Jun 22 13:09:53 fifty /kernel: cpu2 (AP):  apic id:  2, version: 0x00040010, at 0xfee00000
Jun 22 13:09:53 fifty /kernel: cpu3 (AP):  apic id:  3, version: 0x00040010, at 0xfee00000
Jun 22 13:09:53 fifty /kernel: io0 (APIC): apic id:  4, version: 0x00170011, at 0xfec00000
Jun 22 13:09:53 fifty /kernel: io1 (APIC): apic id:  5, version: 0x00030011, at 0xe4000000
Jun 22 13:09:53 fifty /kernel: io2 (APIC): apic id:  6, version: 0x00030011, at 0xe4001000
Jun 22 13:09:53 fifty /kernel: io3 (APIC): apic id:  7, version: 0x00030011, at 0xe5d01000
Jun 22 13:09:53 fifty /kernel: io4 (APIC): apic id:  8, version: 0x00030011, at 0xe5d03000
Jun 22 13:09:53 fifty /kernel: io5 (APIC): apic id:  9, version: 0x00030011, at 0xe5d05000
Jun 22 13:09:53 fifty /kernel: io6 (APIC): apic id: 10, version: 0x00030011, at 0xe5d07000
Jun 22 13:09:53 fifty /kernel: Preloaded elf kernel "kernel" at 0xc0455000.
Jun 22 13:09:53 fifty /kernel: Preloaded elf module "splash_bmp.ko" at 0xc045509c.
Jun 22 13:09:53 fifty /kernel: Preloaded splash_image_data "/boot/splash.bmp" at 0xc0455140.
Jun 22 13:09:53 fifty /kernel: Pentium Pro MTRR support enabled
Jun 22 13:09:53 fifty /kernel: md0: Malloc disk
Jun 22 13:09:53 fifty /kernel: Using $PIR table, 24 entries at 0xc00fde40
Jun 22 13:09:53 fifty /kernel: npx0: <math processor> on motherboard
Jun 22 13:09:53 fifty /kernel: npx0: INT 16 interface
Jun 22 13:09:53 fifty /kernel: pcib0: <Host to PCI bridge> on motherboard
Jun 22 13:09:53 fifty /kernel: pci0: <PCI bus> on pcib0
Jun 22 13:09:53 fifty /kernel: pcib16: <PCI to PCI bridge (vendor=1022 device=7460)> at device 6.0 on pci0
Jun 22 13:09:53 fifty /kernel: IOAPIC #0 intpin 19 -> irq 2
Jun 22 13:09:53 fifty /kernel: IOAPIC #0 intpin 17 -> irq 16
Jun 22 13:09:53 fifty /kernel: pci1: <PCI bus> on pcib16
Jun 22 13:09:53 fifty /kernel: pci1: <OHCI USB controller> at 0.0 irq 2
Jun 22 13:09:53 fifty /kernel: pci1: <OHCI USB controller> at 0.1 irq 2
Jun 22 13:09:53 fifty /kernel: pci1: <Trident model 9880 VGA-compatible display device> at 5.0 irq 16
Jun 22 13:09:53 fifty /kernel: isab0: <PCI to ISA bridge (vendor=1022 device=7468)> at device 7.0 on pci0
Jun 22 13:09:53 fifty /kernel: isa0: <ISA bus> on isab0
Jun 22 13:09:54 fifty /kernel: atapci0: <AMD 8111 ATA133 controller> port 0x1000-0x100f at device 7.1 on pci0
Jun 22 13:09:54 fifty /kernel: ata0: at 0x1f0 irq 14 on atapci0
Jun 22 13:09:54 fifty /kernel: ata1: at 0x170 irq 15 on atapci0
Jun 22 13:09:54 fifty /kernel: chip0: <PCI to Other bridge (vendor=1022 device=746b)> at device 7.3 on pci0
Jun 22 13:09:54 fifty /kernel: pcib17: <PCI to PCI bridge (vendor=1022 device=7450)> at device 10.0 on pci0
Jun 22 13:09:54 fifty /kernel: IOAPIC #1 intpin 1 -> irq 17
Jun 22 13:09:54 fifty /kernel: IOAPIC #1 intpin 2 -> irq 18
Jun 22 13:09:54 fifty /kernel: IOAPIC #1 intpin 3 -> irq 19
Jun 22 13:09:54 fifty /kernel: pci2: <PCI bus> on pcib17
Jun 22 13:09:54 fifty /kernel: bge0: <Broadcom BCM5703 Gigabit Ethernet, ASIC rev. 0x1002> mem 0xe5800000-0xe580ffff irq 17 at device 2.0 on pci2
Jun 22 13:09:54 fifty /kernel: bge0: Ethernet address: 00:09:3d:00:d4:e1
Jun 22 13:09:54 fifty /kernel: miibus0: <MII bus> on bge0
Jun 22 13:09:54 fifty /kernel: brgphy0: <BCM5703 10/100/1000baseTX PHY> on miibus0
Jun 22 13:09:54 fifty /kernel: brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto
Jun 22 13:09:54 fifty /kernel: bge1: <Broadcom BCM5703 Gigabit Ethernet, ASIC rev. 0x1002> mem 0xe5810000-0xe581ffff irq 18 at device 3.0 on pci2
Jun 22 13:09:54 fifty /kernel: bge1: Ethernet address: 00:09:3d:00:d4:e2
Jun 22 13:09:54 fifty /kernel: miibus1: <MII bus> on bge1
Jun 22 13:09:54 fifty /kernel: brgphy1: <BCM5703 10/100/1000baseTX PHY> on miibus1
Jun 22 13:09:54 fifty /kernel: brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto
Jun 22 13:09:54 fifty /kernel: mpt0: <LSILogic 1030 Ultra4 Adapter> port 0x2000-0x20ff mem 0xe5820000-0xe582ffff,0xe5830000-0xe583ffff irq 19 at device 4.0 on pci2
Jun 22 13:09:54 fifty /kernel: pcib18: <PCI to PCI bridge (vendor=1014 device=01a7)> at device 5.0 on pci2
Jun 22 13:09:54 fifty /kernel: IOAPIC #1 intpin 0 -> irq 20
Jun 22 13:09:54 fifty /kernel: pci3: <PCI bus> on pcib18
Jun 22 13:09:54 fifty /kernel: amr0: <LSILogic MegaRAID> mem 0xe5900000-0xe597ffff,0xe5c00000-0xe5c0ffff irq 20 at device 0.0 on pci3
Jun 22 13:09:54 fifty /kernel: amr0: <LSILogic MegaRAID SCSI 320-2X> Firmware 413G, BIOS H414, 128MB RAM
Jun 22 13:09:54 fifty /kernel: pci0: <unknown card> (vendor=0x1022, dev=0x7451) at 10.1
Jun 22 13:09:54 fifty /kernel: pcib19: <PCI to PCI bridge (vendor=1022 device=7450)> at device 11.0 on pci0
Jun 22 13:09:54 fifty /kernel: pci4: <PCI bus> on pcib19
Jun 22 13:09:54 fifty /kernel: pci0: <unknown card> (vendor=0x1022, dev=0x7451) at 11.1
Jun 22 13:09:54 fifty /kernel: pcib1: <Host to PCI bridge> on motherboard
Jun 22 13:09:54 fifty /kernel: pci5: <PCI bus> on pcib1
Jun 22 13:09:54 fifty /kernel: pcib2: <Host to PCI bridge> on motherboard
Jun 22 13:09:54 fifty /kernel: pci6: <PCI bus> on pcib2
Jun 22 13:09:54 fifty /kernel: pcib3: <Host to PCI bridge> on motherboard
Jun 22 13:09:54 fifty /kernel: pci7: <PCI bus> on pcib3
Jun 22 13:09:54 fifty /kernel: pcib4: <Host to PCI bridge> on motherboard
Jun 22 13:09:54 fifty /kernel: pci8: <PCI bus> on pcib4
Jun 22 13:09:54 fifty /kernel: pcib5: <Host to PCI bridge> on motherboard
Jun 22 13:09:54 fifty /kernel: pci9: <PCI bus> on pcib5
Jun 22 13:09:54 fifty /kernel: pcib6: <Host to PCI bridge> on motherboard
Jun 22 13:09:54 fifty /kernel: pci10: <PCI bus> on pcib6
Jun 22 13:09:54 fifty /kernel: pcib7: <Host to PCI bridge> on motherboard
Jun 22 13:09:54 fifty /kernel: pci11: <PCI bus> on pcib7
Jun 22 13:09:54 fifty /kernel: pcib8: <Host to PCI bridge> on motherboard
Jun 22 13:09:54 fifty /kernel: pci12: <PCI bus> on pcib8
Jun 22 13:09:54 fifty /kernel: pcib9: <Host to PCI bridge> on motherboard
Jun 22 13:09:54 fifty /kernel: pci13: <PCI bus> on pcib9
Jun 22 13:09:54 fifty /kernel: pcib10: <Host to PCI bridge> on motherboard
Jun 22 13:09:54 fifty /kernel: pci14: <PCI bus> on pcib10
Jun 22 13:09:54 fifty /kernel: pcib11: <Host to PCI bridge> on motherboard
Jun 22 13:09:54 fifty /kernel: pci15: <PCI bus> on pcib11
Jun 22 13:09:54 fifty /kernel: pcib12: <Host to PCI bridge> on motherboard
Jun 22 13:09:54 fifty /kernel: pci16: <PCI bus> on pcib12
Jun 22 13:09:54 fifty /kernel: pcib13: <Host to PCI bridge> on motherboard
Jun 22 13:09:54 fifty /kernel: pci17: <PCI bus> on pcib13
Jun 22 13:09:54 fifty /kernel: pcib14: <Host to PCI bridge> on motherboard
Jun 22 13:09:54 fifty /kernel: pci18: <PCI bus> on pcib14
Jun 22 13:09:54 fifty /kernel: pcib15: <Host to PCI bridge> on motherboard
Jun 22 13:09:54 fifty /kernel: pci19: <PCI bus> on pcib15
Jun 22 13:09:54 fifty /kernel: orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc97ff,0xc9800-0xcafff,0xcb000-0xcb7ff on isa0
Jun 22 13:09:54 fifty /kernel: pmtimer0 on isa0
Jun 22 13:09:54 fifty /kernel: atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
Jun 22 13:09:54 fifty /kernel: atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
Jun 22 13:09:54 fifty /kernel: kbd0 at atkbd0
Jun 22 13:09:54 fifty /kernel: vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Jun 22 13:09:54 fifty /kernel: sc0: <System console> at flags 0x100 on isa0
Jun 22 13:09:54 fifty /kernel: sc0: VGA <16 virtual consoles, flags=0x300>
Jun 22 13:09:54 fifty /kernel: sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
Jun 22 13:09:54 fifty /kernel: sio0: type 16550A
Jun 22 13:09:54 fifty /kernel: sio1 at port 0x2f8-0x2ff irq 3 on isa0
Jun 22 13:09:54 fifty /kernel: sio1: type 16550A
Jun 22 13:09:54 fifty /kernel: ppc0: parallel port not found.
Jun 22 13:09:54 fifty /kernel: APIC_IO: Testing 8254 interrupt delivery
Jun 22 13:09:54 fifty /kernel: APIC_IO: routing 8254 via IOAPIC #0 intpin 2
Jun 22 13:09:54 fifty /kernel: ipfw2 initialized, divert disabled, rule-based forwarding enabled, default to deny, logging unlimited
Jun 22 13:09:54 fifty /kernel: SMP: AP CPU #1 Launched!
Jun 22 13:09:54 fifty /kernel: SMP: AP CPU #3 Launched!
Jun 22 13:09:54 fifty /kernel: SMP: AP CPU #2 Launched!
Jun 22 13:09:54 fifty /kernel: acd0: DVD-ROM <DV-28E-C> at ata1-master PIO4
Jun 22 13:09:54 fifty /kernel: Waiting 15 seconds for SCSI devices to settle
Jun 22 13:09:54 fifty /kernel: amrd0: <LSILogic MegaRAID logical drive> on amr0
Jun 22 13:09:54 fifty /kernel: amrd0: 140006MB (286732288 sectors) RAID 1 (optimal)
Jun 22 13:09:54 fifty /kernel: pass0 at amr0 bus 0 target 6 lun 0
Jun 22 13:09:54 fifty /kernel: pass0: <SDR GEM318P 1> Fixed Processor SCSI-2 device
Jun 22 13:09:54 fifty /kernel: Mounting root from ufs:/dev/amrd0s2a
Jun 22 13:09:54 fifty /kernel: WARNING: / was not properly dismounted
Jun 22 13:11:17 fifty /kernel: bge0: gigabit link up
Jun 22 14:00:01 fifty newsyslog[62771]: logfile turned over due to size>1K
Jun 22 14:00:12 fifty mxsyslog[60040] logfile enrolled as /server/ftp/log/kernel.4


>How-To-Repeat:
	complex.  see above
>Fix:

	


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->phk 
Responsible-Changed-By: rwatson 
Responsible-Changed-When: Thu Jul 21 15:42:29 GMT 2005 
Responsible-Changed-Why:  
Chown to phk, who will have some questions about configuration. 


http://www.freebsd.org/cgi/query-pr.cgi?pr=82846 

From: "Poul-Henning Kamp" <phk@phk.freebsd.dk>
To: Robert Watson <rwatson@FreeBSD.org>
Cc: freebsd-bugs@FreeBSD.org
Subject: Re: kern/82846: Kernel crash in 5.4 with SMP,PAE 
Date: Thu, 21 Jul 2005 17:55:19 +0200

 In message <200507211543.j6LFhCCK041297@freefall.freebsd.org>, Robert Watson writes:
 
 >Chown to phk, who will have some questions about configuration.
 
 Yeah, well, maybe :-)
 
 It does look like a locking problem to me but that is not much of a clue.
 
 Do you have any idea which process/which file/which filesystem this
 happened to ?
 
 -- 
 Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
 phk@FreeBSD.ORG         | TCP/IP since RFC 956
 FreeBSD committer       | BSD since 4.3-tahoe    
 Never attribute to malice what can adequately be explained by incompetence.

From: Robert Watson <rwatson@FreeBSD.org>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/82846: Kernel crash in 5.4 with SMP,PAE
Date: Thu, 21 Jul 2005 18:14:57 +0100 (BST)

 Further correspondence.
 
 Robert N M Watson
 
 ---------- Forwarded message ----------
 Date: Thu, 21 Jul 2005 12:48:41 -0400
 From: Chris Gabe <chris@borderware.com>
 To: Robert Watson <rwatson@FreeBSD.org>
 Cc: scottl@samsco.org, jhb@freebsd.org, phk@FreeBSD.org, jeff@FreeBSD.org
 Subject: Re: Advice needed on how to find kernel crash
 
 Robert Watson wrote:
 
 > Robert N M Watson
 > 
 > On Thu, 21 Jul 2005, Chris Gabe wrote:
 > 
 >> A couple weeks ago I submitted Freebsd pr 82846 
 >> (http://www.freebsd.org/cgi/query-pr.cgi?pr=82846) This affects a fair 
 >> number of like systems we plan to ship to some big customers.
 >> 
 >> It's probably quite hard to get to a formula where others can reproduce this 
 >> crash, and I doubt anyone is going to answer the pr.  I'm looking for 
 >> someone who is expert in the domain to make some concrete suggestion.  I 
 >> would be most grateful if an expert in the area could have a quick look at 
 >> the stack trace I transcribed and maybe make a suggestion or two.  "Run this 
 >> debug option, disable that driver, sync this set of drivers,..."; whatever 
 >> might speed the search.
 >> 
 >> It's happened with or without PAE, and I believe also with or without SMP, 
 >> though I only have the stack trace for the latest with both enabled so can't 
 >> confirm it's the same thing. It takes a day or two to get it to trigger.
 > 
 > 
 > Chris,
 > 
 > Thanks for your e-mail.  I've changed the PR owner to Poul-Henning, who will 
 > probably ask you to follow up to the PR with some additional configuration 
 > details.  I've also CC'd Jeff Roberson, who's done a lot of the recent VFS 
 > architecture work.
 > 
 > The stack trace suggests the involvement of a good three different important 
 > subsystems: VFS, VM/swap, and storage devices.  So we'll probably need to 
 > narrow it down a bitmore.
 > 
 > Do you have a machine you can dedicate for a few days to working with someone 
 > on the problem?  Of so, you'll probably be asked to configure the box with 
 > INVARIANTS.  I'm not sure which version of FreeBSD started supporting "large" 
 > memory crash dumps, but we might also ask you to slide forward to the HEAD of 
 > 5-STABLE as the debugging tools may be slightly better there.
 > 
 > Also, could you say a little more about the workload that is involved on that 
 > system?
 > 
 > Thanks,
 > 
 > Robert N M Watson
 
 Thanks for your prompt reply. 
 Yes we have a machine that can be dedicated to this.
 5-STABLE HEAD is no issue, and include options INVARIANTS.  I'll get that in 
 motion.
 
 Workload... the enclosed scripts did trigger it a few times on their own.  They 
 just create and read several GB of data from disk, in parallel.  I.e. 32 
 programs each read 256MB, in parallel, from different files, each in 1MB chunks 
 to their own 1MB buffer, to a total of 32MB memory used on a file footprint of 
 8GB.  I think maybe re-creating them regularly (running grow.sh repeatedly) 
 helped speed the crash, but yield.sh is the crasher I think.  We also ran other 
 programs (much more complex but not big memory users) at the same time to try 
 to trigger it; that is what we were doing when we capture the stack trace.
 
 The whole point of this was to determine that PAE extensions do allow us to 
 rely on more than 4GB of files remaining in memory even when they are not all 
 in active process memory.  This is essential for our database application - 
 repeated accesses to a bunch of large files totaling more than 4GB.  It does do 
 what we want, but sometimes crashes.
 These programs should work with or without PAE and/or SMP, I think we saw the 
 crash in both cases.  Certainly with PAE and SMP.
 
 big.c, used to create a specified size file (anything will do here...)
 #include <stdlib.h>
 
 main(int argc, char **argv)
 {
 int i, BUFSIZE;
 char *buf;
 
 BUFSIZE = atoi(argv[1]) * 1024 * 1024;
 buf = (char *)malloc(BUFSIZE);
 if (!buf) {
     printf("no %d\n",BUFSIZE);
     exit(1);
 }
 printf("yes %d %s %s\n",BUFSIZE,argv[1],argv[2]);
 
 for (i=0; i<BUFSIZE; i++)
     buf[i] = i;
 
 write(1,buf,BUFSIZE);
 }
 
 grow.sh, to call that to create 8GB worth of quarter MB files, named b11, b12, 
 ... b14, b21, ... b84 respectively
 
 #!/bin/sh
 
 for mb in 1 2 3 4 5 6 7 8 ; do
     for quart in 1 2 3 4 ; do
         ./getem 256 1 <b$mb$quart &
     done
 done
 
 getem.c to read in the large files in chunks to one smaller buffer
 #include <stdlib.h>
 
 main(int argc, char **argv)
 {
 int i, BUFSIZE, CHUNK;
 char *buf;
 
 BUFSIZE = atoi(argv[1]) * 1024 * 1024;
 CHUNK = atoi(argv[2]) * 1024 * 1024;
 buf = (char *)malloc(CHUNK);
 if (!buf) {
     printf("no %d\n",CHUNK);
     exit(1);
 }
 printf("yes %d %s %s\n",CHUNK,argv[1],argv[2]);
 
 for (i=0; i<BUFSIZE/CHUNK; i++) {
     if (read(0,buf,CHUNK) != CHUNK) {
         printf("read failed %d %s\n",CHUNK,argv[1]);
         exit(2);
     }
 }
 printf("done %d %d\n",BUFSIZE,CHUNK);
 
 }
 
 yield.sh to read them
 #!/bin/sh
 
 for mb in 1 2 3 4 5 6 7 8 ; do
     for quart in 1 2 3 4 ; do
         ./getem 256 1 <b$mb$quart &
     done
 done
 
 Calling grow.sh, then yield.sh, waiting for completion, then repeating, crashes 
 eventually, we find.
State-Changed-From-To: open->feedback 
State-Changed-By: eadler 
State-Changed-When: Sat Sep 24 04:19:22 UTC 2011 
State-Changed-Why:  
Is this still an issue on recent versions of FreeBSD? 

http://www.freebsd.org/cgi/query-pr.cgi?pr=82846 
Responsible-Changed-From-To: phk->freebsd-bugs 
Responsible-Changed-By: eadler 
Responsible-Changed-When: Sat Sep 24 17:45:27 UTC 2011 
Responsible-Changed-Why:  
return to the pool (approved by phk) 

http://www.freebsd.org/cgi/query-pr.cgi?pr=82846 
State-Changed-From-To: feedback->closed 
State-Changed-By: remko 
State-Changed-When: Wed Oct 12 15:09:22 UTC 2011 
State-Changed-Why:  
Feedback timeout 

http://www.freebsd.org/cgi/query-pr.cgi?pr=82846 
>Unformatted:
