From bob@luke.pmr.com Tue Mar 30 09:11:59 1999
Return-Path: <bob@luke.pmr.com>
Received: from luke.pmr.com (luke.pmr.com [207.170.114.132])
	by hub.freebsd.org (Postfix) with ESMTP id 6E00314EA6
	for <FreeBSD-gnats-submit@freebsd.org>; Tue, 30 Mar 1999 09:10:03 -0800 (PST)
	(envelope-from bob@luke.pmr.com)
Received: (from bob@localhost)
	by luke.pmr.com (8.9.2/8.9.2) id LAA33066;
	Tue, 30 Mar 1999 11:09:42 -0600 (CST)
	(envelope-from bob)
Message-Id: <199903301709.LAA33066@luke.pmr.com>
Date: Tue, 30 Mar 1999 11:09:42 -0600 (CST)
From: bob@pmr.com
Sender: bob@luke.pmr.com
Reply-To: bob@pmr.com
To: FreeBSD-gnats-submit@freebsd.org
Subject: Panic in soreceive() in 3.1-stable running amanda
X-Send-Pr-Version: 3.2

>Number:         10872
>Category:       kern
>Synopsis:       Panic in sorecieve() due to NULL mbuf pointer
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Mar 30 09:20:00 PST 1999
>Closed-Date:    Mon Dec 3 09:41:39 PST 2001
>Last-Modified:  Mon Dec 03 09:42:29 PST 2001
>Originator:     Bob Willcox
>Release:        FreeBSD 3.1-STABLE i386
>Organization:
Power Micro Research
>Environment:

    FreeBSD deathstar.pmr.com 3.1-STABLE FreeBSD 3.1-STABLE #4: Tue Mar 30 08:59:32 CST 1999     bob@deathstar.pmr.com:/usr/src/sys/compile/DEATHSTAR  i386

>Description:

    A panic occurs on this system during my nightly amanda backups (this is
    my amanda backup server).  The panic is the result of the sb_mb pointer
    being NULL in soreceive when loaded into m at line 642 in uipc_socket.c.

    At the time of the panic amanda is loading the system pretty well with
    5 dumps running (from 5 different systems on the network) and writing to
    the Mammoth tape drive.

    Note that this problem suddenly started happening (last Friday morning).
    Prior to that I had not changed this system (deathstar) for several
    weeks, though the client systems had changed (I don't have a precise
    record of those changes).  I have since changed deathstar (upgraded to
    more recent 3.1-stable and modified the kernel configuration) in a (so
    far) futile attempt to work-arround the problem.


    Some (hopefully helpful) info from the crash dump:

    #0  boot (howto=260) at ../../kern/kern_shutdown.c:285
    285                     dumppcb.pcb_cr3 = rcr3();
    (kgdb) where
    #0  boot (howto=260) at ../../kern/kern_shutdown.c:285
    #1  0xf014e705 in panic (fmt=0xf0233f4c "from debugger")
        at ../../kern/kern_shutdown.c:446
    #2  0xf012aab1 in db_panic (addr=-266261713, have_addr=0, count=-1, 
        modif=0xf4224d5c "") at ../../ddb/db_command.c:432
    #3  0xf012aa51 in db_command (last_cmdp=0xf0251e64, cmd_table=0xf0251cc4, 
        aux_cmd_tablep=0xf0267acc) at ../../ddb/db_command.c:332
    #4  0xf012ab16 in db_command_loop () at ../../ddb/db_command.c:454
    #5  0xf012ce67 in db_trap (type=3, code=0) at ../../ddb/db_trap.c:71
    #6  0xf021290a in kdb_trap (type=3, code=0, regs=0xf4224e4c)
        at ../../i386/i386/db_interface.c:157
    #7  0xf021c0b4 in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -202329632, 
          tf_esi = 256, tf_ebp = -199078256, tf_isp = -199078284, 
          tf_ebx = -266105266, tf_edx = -266043248, tf_ecx = -267680032, 
          tf_eax = 18, tf_trapno = 3, tf_err = 0, tf_eip = -266261713, tf_cs = 8, 
          tf_eflags = 598, tf_esp = -266043264, tf_ss = -266111117})
        at ../../i386/i386/trap.c:548
    #8  0xf0212b2f in Debugger (msg=0xf0237773 "panic")
        at ../../i386/i386/db_interface.c:317
    #9  0xf014e6fc in panic (fmt=0xf0238e4e "receive 1")
        at ../../kern/kern_shutdown.c:444
    #10 0xf01667d3 in soreceive (so=0xf3f0b1e0, psa=0x0, uio=0xf4224f40, mp0=0x0, 
        controlp=0x0, flagsp=0x0) at ../../kern/uipc_socket.c:659
    #11 0xf015c6d4 in soo_read (fp=0xf1026540, uio=0xf4224f40, cred=0xf0f2a180)
        at ../../kern/sys_socket.c:69
    #12 0xf01591ed in read (p=0xf418f3c0, uap=0xf4224f94)
        at ../../kern/sys_generic.c:121
    #13 0xf021c8c3 in syscall (frame={tf_es = -272695257, tf_ds = -272695257, 
          tf_edi = -272638492, tf_esi = 64, tf_ebp = -272638364, 
          tf_isp = -199077916, tf_ebx = 0, tf_edx = 82768, tf_ecx = 6, tf_eax = 3, 
          tf_trapno = 7, tf_err = 7, tf_eip = 537674705, tf_cs = 31, 
          tf_eflags = 514, tf_esp = -272638820, tf_ss = 39})
        at ../../i386/i386/trap.c:1100
    #14 0x200c43d1 in ?? ()
    #15 0x1f64 in ?? ()
    #16 0x1099 in ?? ()
    (kgdb) up 10
    #10 0xf01667d3 in soreceive (so=0xf3f0b1e0, psa=0x0, uio=0xf4224f40, mp0=0x0, 
        controlp=0x0, flagsp=0x0) at ../../kern/uipc_socket.c:659
    Source file is more recent than executable.
    659                     KASSERT(m != 0 || !so->so_rcv.sb_cc, ("receive 1"));
    (kgdb) list
    654             if (m == 0 || (((flags & MSG_DONTWAIT) == 0 &&
    655                 so->so_rcv.sb_cc < uio->uio_resid) &&
    656                 (so->so_rcv.sb_cc < so->so_rcv.sb_lowat ||
    657                 ((flags & MSG_WAITALL) && uio->uio_resid <= so->so_rcv.sb_hiwat)) &&
    658                 m->m_nextpkt == 0 && (pr->pr_flags & PR_ATOMIC) == 0)) {
    659                     KASSERT(m != 0 || !so->so_rcv.sb_cc, ("receive 1"));
    660                     if (so->so_error) {
    661                             if (m)
    662                                     goto dontblock;
    663                             error = so->so_error;
    (kgdb) print *so
    $1 = {so_zone = 0xf0f0ef00, so_type = 1, so_options = 0, so_linger = 0, 
      so_state = 2, so_pcb = 0xf400bea0 "", so_proto = 0xf0259294, so_head = 0x0, 
      so_incomp = {tqh_first = 0x0, tqh_last = 0xf3f0b1f8}, so_comp = {
        tqh_first = 0x0, tqh_last = 0xf3f0b200}, so_list = {tqe_next = 0x0, 
        tqe_prev = 0x0}, so_qlen = 0, so_incqlen = 0, so_qlimit = 0, so_timeo = 0, 
      so_error = 0, so_sigio = 0x0, so_oobmark = 0, so_rcv = {sb_cc = 4380, 
        sb_hiwat = 17520, sb_mbcnt = 6528, sb_mbmax = 140160, sb_lowat = 1, 
        sb_mb = 0x0, sb_sel = {si_pid = 0, si_flags = 0}, sb_flags = 1, 
        sb_timeo = 0}, so_snd = {sb_cc = 0, sb_hiwat = 17520, sb_mbcnt = 0, 
        sb_mbmax = 140160, sb_lowat = 2048, sb_mb = 0x0, sb_sel = {si_pid = 0, 
          si_flags = 0}, sb_flags = 0, sb_timeo = 0}, so_upcall = 0, 
      so_upcallarg = 0x0, so_uid = 90, so_gencnt = 3716}
    (kgdb) print m   
    $2 = (struct mbuf *) 0x0
    (kgdb) print *uio
    $3 = {uio_iov = 0xf4224f38, uio_iovcnt = 1, uio_offset = 0xffffffffffffffff, 
      uio_resid = 820, uio_segflg = UIO_USERSPACE, uio_rw = UIO_READ, 
      uio_procp = 0xf418f3c0}


    Dmesg output:

    Copyright (c) 1992-1999 FreeBSD Inc.
    Copyright (c) 1982, 1986, 1989, 1991, 1993
            The Regents of the University of California. All rights reserved.
    FreeBSD 3.1-STABLE #4: Tue Mar 30 08:59:32 CST 1999
        bob@deathstar.pmr.com:/usr/src/sys/compile/DEATHSTAR
    Timecounter "i8254"  frequency 1193182 Hz
    Timecounter "TSC"  frequency 199309847 Hz
    CPU: Pentium Pro (199.31-MHz 686-class CPU)
      Origin = "GenuineIntel"  Id = 0x616  Stepping=6
      Features=0xf9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV>
    real memory  = 33554432 (32768K bytes)
    avail memory = 29958144 (29256K bytes)
    Preloaded elf kernel "kernel" at 0xf02cd000.
    Probing for devices on PCI bus 0:
    chip0: <Intel 82440FX (Natoma) PCI and memory controller> rev 0x02 on pci0.0.0
    chip1: <Intel 82371SB PCI to ISA bridge> rev 0x01 on pci0.1.0
    ahc0: <Adaptec 2940 SCSI adapter> rev 0x00 int a irq 12 on pci0.10.0
    ahc0: aic7870 Single Channel A, SCSI Id=7, 16/255 SCBs
    fxp0: <Intel EtherExpress Pro 10/100B Ethernet> rev 0x01 int a irq 10 on pci0.11.0
    fxp0: Ethernet address 00:a0:c9:31:e6:21
    ncr0: <ncr 53c810 fast10 scsi> rev 0x01 int a irq 11 on pci0.12.0
    ncr1: <ncr 53c875 fast20 wide scsi> rev 0x03 int a irq 9 on pci0.13.0
    Probing for devices on the ISA bus:
    sc0 on isa
    sc0: VGA color <16 virtual consoles, flags=0x0>
    atkbdc0 at 0x60-0x6f on motherboard
    atkbd0 irq 1 on isa
    psm0 not found
    sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa
    sio0: type 16550A
    sio1 at 0x2f8-0x2ff irq 3 on isa
    sio1: type 16550A
    fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
    fdc0: FIFO enabled, 8 bytes threshold
    fd0: 1.44MB 3.5in
    ppc0 at 0x378 irq 7 on isa
    ppc0: W83877F chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
    ppc0: FIFO with 16/16/16 bytes threshold
    nlpt0: <generic printer> on ppbus 0
    nlpt0: Interrupt-driven port
    ppi0: <generic parallel i/o> on ppbus 0
    plip0: <PLIP network interface> on ppbus 0
    vga0 at 0x3b0-0x3df maddr 0xa0000 msize 131072 on isa
    npx0 on motherboard
    npx0: INT 16 interface
    Waiting 10 seconds for SCSI devices to settle
    sa0 at ahc0 bus 0 target 1 lun 0
    sa0: <EXABYTE EXB-89008E000204 V38b> Removable Sequential Access SCSI-2 device 
    sa0: 10.000MB/s transfers (10.000MHz, offset 15)
    sa1 at ncr0 bus 0 target 5 lun 0
    sa1: <WANGTEK 51000  SCSI 75F2> Removable Sequential Access SCSI-2 device 
    sa1: 4.807MB/s transfers (4.807MHz, offset 8)
    changing root device to da0s1a
    cd0 at ncr0 bus 0 target 4 lun 0
    cd0: <TOSHIBA CD-ROM XM-3401TA 0283> Removable CD-ROM SCSI-2 device 
    cd0: 4.237MB/s transfers (4.237MHz, offset 8)
    cd0: Attempt to query device size failed: NOT READY, Medium not present
    da1 at ncr1 bus 0 target 1 lun 0
    da1: <IBM DCAS-34330W S65A> Fixed Direct Access SCSI-2 device 
    da1: 40.000MB/s transfers (20.000MHz, offset 15, 16bit)
    da1: 4134MB (8467200 512 byte sectors: 255H 63S/T 527C)
    da2 at ncr1 bus 0 target 2 lun 0
    da2: <IBM DDRS-39130D DC1B> Fixed Direct Access SCSI-2 device 
    da2: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled
    da2: 8715MB (17850000 512 byte sectors: 255H 63S/T 1111C)
    da0 at ncr1 bus 0 target 0 lun 0
    da0: < DFRSS2W 4B4B> Fixed Direct Access SCSI-2 device 
    da0: 20.000MB/s transfers (10.000MHz, offset 15, 16bit), Tagged Queueing Enabled
    da0: 2150MB (4404489 512 byte sectors: 255H 63S/T 274C)
    ch0 at ahc0 bus 0 target 0 lun 0
    ch0: <EXABYTE EXB-210 5.00> Removable Changer SCSI-2 device 
    ch0: 3.300MB/s transfers
    ch0: 11 slots, 1 drive, 1 picker, 0 portals
    WARNING: / was not properly dismounted
    ffs_mountfs: superblock updated for soft updates
    ffs_mountfs: superblock updated for soft updates
    ffs_mountfs: superblock updated for soft updates
    ffs_mountfs: superblock updated for soft updates
    ffs_mountfs: superblock updated for soft updates
    ffs_mountfs: superblock updated for soft updates
    ffs_mountfs: superblock updated for soft updates
    link_elf: symbol splash_register undefined


    Kernel config file:

    #
    # DEATHSTAR -- Configure file of the DEATHSTAR system
    #
    # For more information read the handbook part System Administration -> 
    # Configuring the FreeBSD Kernel -> The Configuration File. 
    # The handbook is available in /usr/share/doc/handbook or online as
    # latest version from the FreeBSD World Wide Web server 
    # <URL:http://www.FreeBSD.ORG/>
    #
    # An exhaustive list of options and more detailed explanations of the 
    # device lines is present in the ./LINT configuration file. If you are 
    # in doubt as to the purpose or necessity of a line, check first in LINT.
    #
    #   $Id$

    machine             "i386"
    cpu         "I686_CPU"
    ident               DEATHSTAR
    maxusers    40

    options             INET                    #InterNETworking
    options             FFS                     #Berkeley Fast Filesystem
    options             FFS_ROOT                #FFS usable as root device [keep this!]
    options             MFS                     #Memory Filesystem
    options             NFS                     #Network Filesystem
    options             MSDOSFS                 #MSDOS Filesystem
    options             "CD9660"                #ISO 9660 Filesystem
    options             "CD9660_ROOT"           #CD-ROM usable as root. "CD9660" req'ed
    options             PROCFS                  #Process filesystem
    options             "COMPAT_43"             #Compatible with BSD 4.3 [KEEP THIS!]
    options             SCSI_DELAY=10000        #Be pessimistic about Joe SCSI device
    options             UCONSOLE                #Allow users to grab the console
    options             FAILSAFE                #Be conservative
    options             USERCONFIG              #boot -c editor
    options             VISUAL_USERCONFIG       #visual boot -c editor
    options             SOFTUPDATES             #enable soft updates support
    #options            "NMBCLUSTERS=4096"

    config              kernel  root on da0

    controller  isa0
    controller  pci0

    controller  fdc0    at isa? port "IO_FD1" bio irq 6 drq 2
    disk                fd0     at fdc0 drive 0

    # A single entry for any of these controllers (ncr, ahb, ahc) is
    # sufficient for any number of installed devices.
    controller  ncr0
    controller  ahc0

    controller  scbus0

    device              da0
    device              sa0
    device              pass0
    device              cd0
    device              ch0

    # atkbdc0 controlls both the keyboard and the PS/2 mouse
    controller  atkbdc0 at isa? port IO_KBD tty
    device              atkbd0  at isa? tty irq 1
    device              psm0    at isa? tty irq 12

    device              vga0    at isa? port ? conflicts

    # splash screen/screen saver
    #pseudo-device      splash

    # syscons is the default console driver, resembling an SCO console
    device              sc0     at isa? tty

    device              npx0    at isa? port IO_NPX irq 13

    # Serial ports
    device              sio0    at isa? port "IO_COM1" flags 0x10 tty irq 4
    device              sio1    at isa? port "IO_COM2" tty irq 3

    # Parallel port
    device              ppc0    at isa? port? net irq 7
    controller  ppbus0
    device              nlpt0   at ppbus?
    device              plip0   at ppbus?
    device              ppi0    at ppbus?
    #controller vpo0    at ppbus?

    # Order is important here due to intrusive probes, do *not* alphabetize
    # this list of network interfaces until the probes have been fixed.
    # Right now it appears that the ie0 must be probed before ep0. See
    # revision 1.20 of this file.
    device de0
    device fxp0

    pseudo-device       loop
    pseudo-device       ether
    pseudo-device       sl      2
    pseudo-device       ppp     2
    pseudo-device       tun     2
    pseudo-device       pty     64
    pseudo-device       gzip            # Exec gzipped a.out's

    #
    # Enable debug support
    #
    options             KTRACE          #kernel tracing
    options             DDB             #kernel debugger
    options             INVARIANTS      #extra sanity checks
    options             INVARIANT_SUPPORT #needed for INVARIANTS

    #
    # These three options provide support for System V Interface
    # Definition-style interprocess communication, in the form of shared
    # memory, semaphores, and message queues, respectively.
    #
    options             SYSVSHM
    options             SYSVSEM
    options             SYSVMSG

    #  The `bpfilter' pseudo-device enables the Berkeley Packet Filter.  Be
    #  aware of the legal and administrative consequences of enabling this
    #  option.  The number of devices determines the maximum number of
    #  simultaneous BPF clients programs runnable.
    pseudo-device       bpfilter 4      #Berkeley packet filter



>How-To-Repeat:

    All I have to do is run amanda and wait for about an hour and a half (that's
    how long it takes to fail).

>Fix:
        
    Wish I had one to offer.


>Release-Note:
>Audit-Trail:

From: Bob Willcox <bob@luke.pmr.com>
To: freebsd-gnats-submit@freebsd.org
Cc:  
Subject: Re: kern/10872: Panic in sorecieve() due to NULL mbuf pointer
Date: Sun, 4 Apr 1999 15:15:05 -0500

 --/9DWx/yDrRhgMJTb
 Content-Type: text/plain; charset=us-ascii
 
 I have been able to further isolate this problem to probable interaction
 between network and SCSI activity.
 
 Attached is a small test script with which I have been able to easily
 recreate the panic (w/o involving amanda).  Key to recreating the panic
 is to run the test script so that it is writing to two different files
 on the same filesystem of the target system simultaneously (I could not
 make it fail when writing to different filesystems or /dev/null).
 
 You can run the two invocations from the same or different source
 systems (seems to cause the panic more quickly if run from different
 systems).
 
 Note that I have pretty much ruled out hardware as I completely replaced
 the hardware of my backup server system (and my test system for this
 panic) and still get the same panic.
 
 I have been running this script as follows:
 
 ./panic_test 5 65536 deathstar /stuff2/tmp/junk1 &
 ./panic_test 5 65536 deathstar /stuff2/tmp/junk2 &
 
 from either the same or different source systems.  This will attempt to
 write two 2GB files (junk1 and junk2) on the /stuff2 filesystem (one of
 the holding disk filesystems) on deathstar.
 
 
 
 
 For the record, here is the dmesg output for deathstar:
 
 Copyright (c) 1992-1999 FreeBSD Inc.
 Copyright (c) 1982, 1986, 1989, 1991, 1993
 	The Regents of the University of California. All rights reserved.
 FreeBSD 3.1-STABLE #6: Sun Apr  4 11:56:38 CDT 1999
     root@deathstar.pmr.com:/usr/src/sys/compile/DEATHSTAR
 Timecounter "i8254"  frequency 1193182 Hz
 Timecounter "TSC"  frequency 400909762 Hz
 CPU: AMD-K6(tm) 3D processor (400.91-MHz 586-class CPU)
   Origin = "AuthenticAMD"  Id = 0x58c  Stepping=12
   Features=0x8021bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,PGE,MMX>
 real memory  = 67043328 (65472K bytes)
 avail memory = 62144512 (60688K bytes)
 Preloaded elf kernel "kernel" at 0xf02d4000.
 Probing for devices on PCI bus 0:
 chip0: <AcerLabs M1541 (Aladdin-V) PCI host bridge> rev 0x04 on pci0.0.0
 chip1: <AcerLabs M5243 PCI-PCI bridge> rev 0x04 on pci0.1.0
 chip2: <PCI to 0x80 bridge (vendor=10b9 device=7101)> rev 0x00 on pci0.3.0
 chip3: <AcerLabs M1533 portable PCI-ISA bridge> rev 0xc3 on pci0.7.0
 ncr0: <ncr 53c875 fast20 wide scsi> rev 0x03 int a irq 5 on pci0.10.0
 ncr1: <ncr 53c810a fast10 scsi> rev 0x12 int a irq 10 on pci0.11.0
 fxp0: <Intel EtherExpress Pro 10/100B Ethernet> rev 0x01 int a irq 11 on pci0.12.0
 fxp0: Ethernet address 00:a0:c9:00:53:29
 ide_pci0: <Acer Aladdin IV/V (M5229) Bus-master IDE controller> rev 0xc1 int a irq 0 on pci0.15.0
 Probing for devices on PCI bus 1:
 vga0: <Matrox model 0521 graphics accelerator> rev 0x01 int a irq 11 on pci1.0.0
 Probing for devices on the ISA bus:
 sc0 on isa
 sc0: VGA color <16 virtual consoles, flags=0x0>
 atkbdc0 at 0x60-0x6f on motherboard
 atkbd0 irq 1 on isa
 psm0 not found
 sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa
 sio0: type 16550A
 sio1 at 0x2f8-0x2ff irq 3 on isa
 sio1: type 16550A
 fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
 fdc0: FIFO enabled, 8 bytes threshold
 fd0: 1.44MB 3.5in
 wdc0 at 0x1f0-0x1f7 irq 14 flags 0xb0ffb0ff on isa
 wdc0: unit 0 (wd0): <ST34310A>, LBA, DMA, 32-bit, multi-block-16
 wd0: 4111MB (8420832 sectors), 524 cyls, 255 heads, 63 S/T, 512 B/S
 ppc0 at 0x378 irq 7 on isa
 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
 ppc0: FIFO with 16/16/7 bytes threshold
 nlpt0: <generic printer> on ppbus 0
 nlpt0: Interrupt-driven port
 ppi0: <generic parallel i/o> on ppbus 0
 plip0: <PLIP network interface> on ppbus 0
 vga0 at 0x3b0-0x3df maddr 0xa0000 msize 131072 on isa
 npx0 on motherboard
 npx0: INT 16 interface
 Waiting 10 seconds for SCSI devices to settle
 sa0 at ncr1 bus 0 target 1 lun 0
 sa0: <EXABYTE EXB-89008E000204 V38b> Removable Sequential Access SCSI-2 device 
 sa0: 10.000MB/s transfers (10.000MHz, offset 8)
 changing root device to wd0s1a
 WARNING: / was not properly dismounted
 ch0 at ncr1 bus 0 target 0 lun 0
 ch0: <EXABYTE EXB-210 5.00> Removable Changer SCSI-2 device 
 ch0: 3.300MB/s transfers
 ch0: 11 slots, 1 drive, 1 picker, 0 portals
 da1 at ncr0 bus 0 target 8 lun 0
 da1: <IBM DDRS-39130D DC1B> Fixed Direct Access SCSI-2 device 
 da1: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled
 da1: 8715MB (17850000 512 byte sectors: 255H 63S/T 1111C)
 da0 at ncr0 bus 0 target 0 lun 0
 da0: <IBM DDRS-39130D DC1B> Fixed Direct Access SCSI-2 device 
 da0: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled
 da0: 8715MB (17850000 512 byte sectors: 255H 63S/T 1111C)
 ffs_mountfs: superblock updated for soft updates
 ffs_mountfs: superblock updated for soft updates
 ffs_mountfs: superblock updated for soft updates
 ffs_mountfs: superblock updated for soft updates
 ffs_mountfs: superblock updated for soft updates
 fxp0: promiscuous mode enabled
 
 -- 
 Bob Willcox             The man who follows the crowd will usually get no
 bob@luke.pmr.com        further than the crowd.  The man who walks alone is
 Austin, TX              likely to find himself in places no one has ever
                         been.            -- Alan Ashley-Pitt
 
 --/9DWx/yDrRhgMJTb
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename=panic_test
 
 #!/bin/sh
 
 if [ $# -ne 4 ]; then
     echo "Usage: sor_test loopcnt blkcnt host path"
     exit 1
 fi
 
 lpcnt=$1
 blkcnt=$2
 host=$3
 path=$4
 
 i=1
 while [ $i -le $lpcnt ]
 do
     cmd="dd count=$blkcnt bs=32k if=/dev/zero|rsh $host \"dd bs=32k of=$path\""
     echo "$i: $cmd"
     eval $cmd
     i=`expr $i + 1`
 done
 
 --/9DWx/yDrRhgMJTb--
 

From: Bob Willcox <bob@pmr.com>
To: freebsd-gnats-submit@freebsd.org, bob@pmr.com
Cc:  
Subject: Re: kern/10872: Panic in sorecieve() due to NULL mbuf pointer
Date: Wed, 14 Apr 1999 17:00:15 -0500

 As a test, I installed the 4/8 4.0 snap on one of my systems here and
 then upgraded it to 4.0-current (as of this morning, 4/14/99) and re-
 ran my tests.  It still panics, though at a slightly different place
 (though due to the same reason, the mbuf chain pointer is unexpectedly
 NULL).
 
 Here is the crash dump trace back:
 
 
 IdlePTD 2977792
 initial pcb at 268420
 panicstr: sbdrop
 panic messages:
 ---
 panic: sbdrop
 
 syncing disks... 24 24 14 done
 
 dumping to dev 20401, offset 241664
 dump 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11
 10 9 8 7 6 5 4 3 2 1 
 ---
 #0  boot (howto=256) at ../../kern/kern_shutdown.c:287
 287                     dumppcb.pcb_cr3 = rcr3();
 (kgdb) where
 #0  boot (howto=256) at ../../kern/kern_shutdown.c:287
 #1  0xc0149084 in at_shutdown (
     function=0xc0232d26
 <__set_sysctl_set_sym_sysctl___kern_ipc_somaxconn+226>, arg=0xc38c0b84,
 queue=-1
 014232256) at ../../kern/kern_shutdown.c:448
 #2  0xc0162b5c in sbdrop (sb=0xc38c0b84, len=2920)
     at ../../kern/uipc_socket2.c:739
 #3  0xc0162ae5 in sbflush (sb=0xc38c0b84) at
 ../../kern/uipc_socket2.c:719
 #4  0xc0199597 in tcp_disconnect (tp=0xc39523c0)
     at ../../netinet/tcp_usrreq.c:742
 #5  0xc0198d6e in tcp_usr_disconnect (so=0xc38c0b40)
     at ../../netinet/tcp_usrreq.c:268
 #6  0xc0160aac in sodisconnect (so=0xc38c0b40) at
 ../../kern/uipc_socket.c:371
 #7  0xc01608c6 in soclose (so=0xc38c0b40) at
 ../../kern/uipc_socket.c:251
 #8  0xc0157407 in soo_close (fp=0xc093c340, p=0xc3a1c940)
     at ../../kern/sys_socket.c:175
 #9  0xc0141dd4 in closef (fp=0xc093c340, p=0xc3a1c940)
     at ../../kern/kern_descrip.c:1065
 #10 0xc0141bb8 in fdfree (p=0xc3a1c940) at ../../kern/kern_descrip.c:977
 #11 0xc0143096 in exit1 (p=0xc3a1c940, rv=0) at
 ../../kern/kern_exit.c:200
 #12 0xc0142efc in exit1 (p=0xc3a1c940, rv=-1077945712)
     at ../../kern/kern_exit.c:105
 #13 0xc0214086 in syscall (frame={tf_es = 47, tf_ds = -1078001617, 
       tf_edi = 134815272, tf_esi = 0, tf_ebp = -1077945428, 
       tf_isp = -1009856540, tf_ebx = -1, tf_edx = 1, tf_ecx = 0, tf_eax
 = 1, 
       tf_trapno = 8, tf_err = 2, tf_eip = 134755660, tf_cs = 31, 
       tf_eflags = 582, tf_esp = -1077945448, tf_ss = 47})
     at ../../i386/i386/trap.c:1101
 #14 0xc020a5ec in Xint0x80_syscall ()
 #15 0x805b423 in ?? ()
 #16 0x805b1db in ?? ()
 #17 0x805aefb in ?? ()
 #18 0x80480e9 in ?? ()
 (kgdb) up 2
 #2  0xc0162b5c in sbdrop (sb=0xc38c0b84, len=2920)
     at ../../kern/uipc_socket2.c:739
 739                                     panic("sbdrop");
 (kgdb) list
 734
 735             next = (m = sb->sb_mb) ? m->m_nextpkt : 0;
 736             while (len > 0) {
 737                     if (m == 0) {
 738                             if (next == 0)
 739                                     panic("sbdrop");
 740                             m = next;
 741                             next = m->m_nextpkt;
 742                             continue;
 743                     }
 (kgdb) print *sb
 $1 = {sb_cc = 2920, sb_hiwat = 17520, sb_mbcnt = 4352, sb_mbmax =
 140160, 
   sb_lowat = 1, sb_mb = 0x0, sb_sel = {si_pid = 0, si_flags = 0}, 
   sb_flags = 0, sb_timeo = 0}
 (kgdb) 
 
 -- 
 Bob Willcox             The man who follows the crowd will usually get
 no
 bob@luke.pmr.com        further than the crowd.  The man who walks alone
 is
 Austin, TX              likely to find himself in places no one has ever
                         been.            -- Alan Ashley-Pitt
 

From: Pierre Beyssac <beyssac@enst.fr>
To: freebsd-bugs@freebsd.org, bob@pmr.com
Cc: FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/10872: Panic in sorecieve()
Date: Tue, 11 May 1999 18:59:56 +0200

 I was looking into PR kern/10872, hoping to find an easily fixable
 occurence of NULL mbuf pointer. But it doesn't seem to be.
 
 It's labelled "Panic in sorecieve() due to NULL mbuf pointer", but
 from the debug data filed with the PR it seems the actual problem
 is with so_rcv.sb_cc being 0, which triggers a KASSERT in uipc_socket.c:
 
         if (m == 0 || (((flags & MSG_DONTWAIT) == 0 &&
             so->so_rcv.sb_cc < uio->uio_resid) &&
             (so->so_rcv.sb_cc < so->so_rcv.sb_lowat ||
             ((flags & MSG_WAITALL) && uio->uio_resid <= so->so_rcv.sb_hiwat)) &&
             m->m_nextpkt == 0 && (pr->pr_flags & PR_ATOMIC) == 0)) {
                 KASSERT(m != 0 || !so->so_rcv.sb_cc, ("receive 1"));
 
 (more data can be found in the PR)
 
 I can't seem to be able to reproduce the problem on -current with
 the script provided by Bob, and I don't have a -stable box to try
 it on either.
 
 Plus, I don't have (yet) much of a clue regarding the semantics of
 sb_cc. I continue investigating this stuff, but if anyone has more
 clue than I have, he's welcome to send me some directions to look
 into :-)
 -- 
 Pierre Beyssac		pb@enst.fr
 

From: Peter Wemm <peter@netplex.com.au>
To: Pierre Beyssac <beyssac@enst.fr>
Cc: freebsd-bugs@freebsd.org, bob@pmr.com,
	FreeBSD-gnats-submit@freebsd.org
Subject: kern/10872: Panic in sorecieve() 
Date: Wed, 12 May 1999 01:28:42 +0800

 Pierre Beyssac wrote:
 > I was looking into PR kern/10872, hoping to find an easily fixable
 > occurence of NULL mbuf pointer. But it doesn't seem to be.
 
 I just looked at the PR.  He's running:
 
     ahc0: <Adaptec 2940 SCSI adapter> rev 0x00 int a irq 12 on pci0.10.0
     ahc0: aic7870 Single Channel A, SCSI Id=7, 16/255 SCBs
     ncr0: <ncr 53c810 fast10 scsi> rev 0x01 int a irq 11 on pci0.12.0
     ncr1: <ncr 53c875 fast20 wide scsi> rev 0x03 int a irq 9 on pci0.13.0
 
 It should be noted that freefall had a severe case of problems like this
 that all but disappeared when the ncr cards were swapped for an ahc2940U2W.
 
 Quite how this should make such a dramatic difference is a bit of a
 mystery.  We were seeing really strange things like a bit of the kernel
 stack being partly trashed and messing up some local variables..
 
 Cheers,
 -Peter
 
 

From: Bob Willcox <bob@luke.pmr.com>
To: Pierre Beyssac <beyssac@enst.fr>
Cc: freebsd-bugs@freebsd.org, bob@pmr.com,
	FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/10872: Panic in sorecieve()
Date: Tue, 11 May 1999 12:41:17 -0500

 On Tue, May 11, 1999 at 06:59:56PM +0200, Pierre Beyssac wrote:
 > I was looking into PR kern/10872, hoping to find an easily fixable
 > occurence of NULL mbuf pointer. But it doesn't seem to be.
 > 
 > It's labelled "Panic in sorecieve() due to NULL mbuf pointer", but
 > from the debug data filed with the PR it seems the actual problem
 > is with so_rcv.sb_cc being 0, which triggers a KASSERT in uipc_socket.c:
 > 
 >         if (m == 0 || (((flags & MSG_DONTWAIT) == 0 &&
 >             so->so_rcv.sb_cc < uio->uio_resid) &&
 >             (so->so_rcv.sb_cc < so->so_rcv.sb_lowat ||
 >             ((flags & MSG_WAITALL) && uio->uio_resid <= so->so_rcv.sb_hiwat)) &&
 >             m->m_nextpkt == 0 && (pr->pr_flags & PR_ATOMIC) == 0)) {
 >                 KASSERT(m != 0 || !so->so_rcv.sb_cc, ("receive 1"));
 > 
 > (more data can be found in the PR)
 
 Hmm, I haven't looked at this in a few weeks (I downgraded my amanda
 backup server to 2.2.8 to work around the problem till I could find a
 fix).  The problem as I have seen it is that the mbuf chain pointer (m)
 is NULL and so_rcv.sb_cc is not zero.  Its as though somewhere either
 the mbuf chain pointer gets zapped with NULL or something fails to
 properly update so_rcv.sb_cc as mbufs are processed.
 
 I believe one can expand the KASSERT macro and rewrite the line:
 
     KASSERT(m != 0 || !so->so_rcv.sb_cc, ("receive 1"));
 
 as
 
     do {
 	if (!(m != 0 || !so->so_rcv.sb_cc))
 	    panic("receive 1");
     } while (0);
 
 which can be simplified into:
 
     do {
 	if (m == 0 && so->so_rcv.sb_cc != 0)
 	    panic("receive 1");
     } while (0);
 
 by removing the ! from the expression and adjusting it accordingly.
 
 > 
 > I can't seem to be able to reproduce the problem on -current with
 > the script provided by Bob, and I don't have a -stable box to try
 > it on either.
 
 I have been able to reproduce it on both -stable and -current (but not
 2.2.8).  I have a full-duplex 100Mb ethernet switch that my systems
 are on.  On slower networks it may not fail.  It seems to be timing
 dependent.
 
 > 
 > Plus, I don't have (yet) much of a clue regarding the semantics of
 > sb_cc. I continue investigating this stuff, but if anyone has more
 > clue than I have, he's welcome to send me some directions to look
 > into :-)
 > -- 
 > Pierre Beyssac		pb@enst.fr
 
 -- 
 Bob Willcox             The man who follows the crowd will usually get no
 bob@luke.pmr.com        further than the crowd.  The man who walks alone is
 Austin, TX              likely to find himself in places no one has ever
                         been.            -- Alan Ashley-Pitt
 

From: Bob Willcox <bob@luke.pmr.com>
To: Peter Wemm <peter@netplex.com.au>
Cc: freebsd-bugs@freebsd.org, bob@pmr.com,
	FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/10872: Panic in sorecieve()
Date: Tue, 11 May 1999 12:47:07 -0500

 On Wed, May 12, 1999 at 01:28:42AM +0800, Peter Wemm wrote:
 > Pierre Beyssac wrote:
 > > I was looking into PR kern/10872, hoping to find an easily fixable
 > > occurence of NULL mbuf pointer. But it doesn't seem to be.
 > 
 > I just looked at the PR.  He's running:
 > 
 >     ahc0: <Adaptec 2940 SCSI adapter> rev 0x00 int a irq 12 on pci0.10.0
 >     ahc0: aic7870 Single Channel A, SCSI Id=7, 16/255 SCBs
 >     ncr0: <ncr 53c810 fast10 scsi> rev 0x01 int a irq 11 on pci0.12.0
 >     ncr1: <ncr 53c875 fast20 wide scsi> rev 0x03 int a irq 9 on pci0.13.0
 > 
 > It should be noted that freefall had a severe case of problems like this
 > that all but disappeared when the ncr cards were swapped for an ahc2940U2W.
 
 I wish that were so.  I wound up replacing the entire system (everything
 except the tape library and drive) with one that has 2 ncr adapters in
 it (one for the holding disks and the other for the tape library/drive)
 and still got the panic.
 
 > 
 > Quite how this should make such a dramatic difference is a bit of a
 > mystery.  We were seeing really strange things like a bit of the kernel
 > stack being partly trashed and messing up some local variables..
 
 This problem seems to be very timing sensative.  I will try to recreate
 it again here on my -current test system.
 
 Bob
 
 > 
 > Cheers,
 > -Peter
 
 -- 
 Bob Willcox             The man who follows the crowd will usually get no
 bob@luke.pmr.com        further than the crowd.  The man who walks alone is
 Austin, TX              likely to find himself in places no one has ever
                         been.            -- Alan Ashley-Pitt
 

From: Pierre Beyssac <beyssac@enst.fr>
To: Bob Willcox <bob@pmr.com>
Cc: freebsd-bugs@freebsd.org, FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/10872: Panic in sorecieve()
Date: Tue, 11 May 1999 19:53:11 +0200

 On Tue, May 11, 1999 at 12:41:17PM -0500, Bob Willcox wrote:
 > fix).  The problem as I have seen it is that the mbuf chain pointer (m)
 > is NULL and so_rcv.sb_cc is not zero.  Its as though somewhere either
 > the mbuf chain pointer gets zapped with NULL or something fails to
 
 This can happen when the system is out of mbufs. Sadly there are
 many places in the kernel where the condition is not trapped at
 all.
 
 How many mbufs does netstat -m report on your system? Maybe I
 couldn't reproduce it because my kernel is configured with maxusers
 128, which yields more mbufs. You can try that as a temporary fix.
 
 > properly update so_rcv.sb_cc as mbufs are processed.
 > 
 > I believe one can expand the KASSERT macro and rewrite the line:
 > 	if (m == 0 && so->so_rcv.sb_cc != 0)
 
 Oops, you're right. I stupidly looked at so_snd.sb_cc in the debug
 output, which is 0.
 
 I prefer that, it'll probably be easier to fix.
 -- 
 Pierre Beyssac		pb@enst.fr
 

From: Bob Willcox <bob@luke.pmr.com>
To: Pierre Beyssac <beyssac@enst.fr>
Cc: Bob Willcox <bob@pmr.com>, freebsd-bugs@freebsd.org,
	FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/10872: Panic in sorecieve()
Date: Tue, 11 May 1999 13:00:04 -0500

 On Tue, May 11, 1999 at 07:53:11PM +0200, Pierre Beyssac wrote:
 > On Tue, May 11, 1999 at 12:41:17PM -0500, Bob Willcox wrote:
 > > fix).  The problem as I have seen it is that the mbuf chain pointer (m)
 > > is NULL and so_rcv.sb_cc is not zero.  Its as though somewhere either
 > > the mbuf chain pointer gets zapped with NULL or something fails to
 > 
 > This can happen when the system is out of mbufs. Sadly there are
 > many places in the kernel where the condition is not trapped at
 > all.
 > 
 > How many mbufs does netstat -m report on your system? Maybe I
 > couldn't reproduce it because my kernel is configured with maxusers
 > 128, which yields more mbufs. You can try that as a temporary fix.
 
 I have just updated my -current test system (cvsuped as of this morning)
 and will see if I can still reproduce it there.  If so I will try
 changing the maxusers to see if that has any effect.
 
 > 
 > > properly update so_rcv.sb_cc as mbufs are processed.
 > > 
 > > I believe one can expand the KASSERT macro and rewrite the line:
 > > 	if (m == 0 && so->so_rcv.sb_cc != 0)
 > 
 > Oops, you're right. I stupidly looked at so_snd.sb_cc in the debug
 > output, which is 0.
 > 
 > I prefer that, it'll probably be easier to fix.
 
 Good.  :-)
 
 > -- 
 > Pierre Beyssac		pb@enst.fr
 
 -- 
 Bob Willcox             The man who follows the crowd will usually get no
 bob@luke.pmr.com        further than the crowd.  The man who walks alone is
 Austin, TX              likely to find himself in places no one has ever
                         been.            -- Alan Ashley-Pitt
 

From: Bob Willcox <bob@luke.pmr.com>
To: Pierre Beyssac <beyssac@enst.fr>
Cc: Bob Willcox <bob@pmr.com>, freebsd-bugs@freebsd.org,
	FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/10872: Panic in sorecieve()
Date: Tue, 11 May 1999 17:30:19 -0500

 Well, I can easily recreate the panic with -current as of this morning.
 I tried the "maxusers 128" change and that did not help.  I have
 attached a slightly modified test shell script that I have been using.
 
 I run this shell script on three other systems simultaneously, all
 writing to the same SCSI disk on the test system (this sort of simulates
 amanda activity with multiple systems all dumping to the holding disk).
 As I mentioned in an earlier note, these systems are all connected
 together via a 100mbps full-duplex switching hub.  Two of them are
 running 3.1-stable and the other is running 2.2.8-release.
 
 I run the tests simultaneously on the three systems as follows:
 
 On obiwan:
 ./panic_test 5 10000 lando /stuff/tmp/obiwan
 
 On deathstar:
 ./panic_test 5 10000 lando /stuff/tmp/deathstar
 
 On luke:
 ./panic_test 5 10000 lando /stuff/tmp/luke
 
 (I've got kind of a Star Wars theme going here)
 
 Usually within about 5 minutes lando panics.  Note that I have built
 lando's kernel with the options INVARIANTS and INVARIANT_SUPPORT.  If
 you don't, you'll still get a panic (sbdrop) but it will occur later on
 during the close of the socket instead of the "receive 1" panic due to
 the KASSERT() that we've been talking about.
 
 One more thing...I never got low on mbufs prior to the panic.
 
 Thanks,
 Bob
 
 On Tue, May 11, 1999 at 07:53:11PM +0200, Pierre Beyssac wrote:
 > On Tue, May 11, 1999 at 12:41:17PM -0500, Bob Willcox wrote:
 > > fix).  The problem as I have seen it is that the mbuf chain pointer (m)
 > > is NULL and so_rcv.sb_cc is not zero.  Its as though somewhere either
 > > the mbuf chain pointer gets zapped with NULL or something fails to
 > 
 > This can happen when the system is out of mbufs. Sadly there are
 > many places in the kernel where the condition is not trapped at
 > all.
 > 
 > How many mbufs does netstat -m report on your system? Maybe I
 > couldn't reproduce it because my kernel is configured with maxusers
 > 128, which yields more mbufs. You can try that as a temporary fix.
 > 
 > > properly update so_rcv.sb_cc as mbufs are processed.
 > > 
 > > I believe one can expand the KASSERT macro and rewrite the line:
 > > 	if (m == 0 && so->so_rcv.sb_cc != 0)
 > 
 > Oops, you're right. I stupidly looked at so_snd.sb_cc in the debug
 > output, which is 0.
 > 
 > I prefer that, it'll probably be easier to fix.
 > -- 
 > Pierre Beyssac		pb@enst.fr
 
 -- 
 Bob Willcox             The man who follows the crowd will usually get no
 bob@luke.pmr.com        further than the crowd.  The man who walks alone is
 Austin, TX              likely to find himself in places no one has ever
                         been.            -- Alan Ashley-Pitt
 

From: Bob Willcox <bob@luke.pmr.com>
To: Pierre Beyssac <beyssac@enst.fr>
Cc: Bob Willcox <bob@pmr.com>, freebsd-bugs@freebsd.org,
	FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/10872: Panic in sorecieve()
Date: Tue, 11 May 1999 17:33:17 -0500

 --LZvS9be/3tNcYl/X
 Content-Type: text/plain; charset=us-ascii
 
 Oops, I forgot to attach the test shell script as promised to that last
 note.  So here it is this time.
 
 Bob
 
 -- 
 Bob Willcox             The man who follows the crowd will usually get no
 bob@luke.pmr.com        further than the crowd.  The man who walks alone is
 Austin, TX              likely to find himself in places no one has ever
                         been.            -- Alan Ashley-Pitt
 
 --LZvS9be/3tNcYl/X
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename=panic_test
 
 #!/bin/sh
 
 if [ $# -ne 4 ]; then
     echo "Usage: panic_test loopcnt blkcnt host path"
     exit 1
 fi
 
 lpcnt=$1
 blkcnt=$2
 host=$3
 path=$4
 
 i=1
 while [ $i -le $lpcnt ]
 do
     cmd="dd count=$blkcnt bs=32k if=/dev/zero|rsh $host \"dd bs=32k of=$path\""
     echo "$i: $cmd"
     eval $cmd
     i=`expr $i + 1`
 done
 
 --LZvS9be/3tNcYl/X--
 

From: Bill Fenner <fenner@research.att.com>
To: freebsd-gnats-submit@freebsd.org
Cc:  
Subject: Re: kern/10872: Panic in sorecieve()
Date: Tue, 11 May 1999 15:40:09 -0700

 I've seen odd disagreements with sb_cc being non-zero and causing this
 panic on a 2.2.2 system; I thought it had gone away but I also don't
 have that machine any more.  I tried tracking it for a while but really
 didn't get anywhere (well, I got a lot of places, just nowhere useful...)
 
   Bill
 

From: Bosko Milekic <bmilekic@dsuper.net>
To: freebsd-gnats-submit@freebsd.org
Cc:  
Subject: Re: kern/10872: Panic in sorecieve() due to NULL mbuf pointer
Date: Sun, 5 Dec 1999 21:00:43 -0500 (EST)

 (Appologies for the double-post... I don't believe the first post ended up
 in the correct place.)
 
 
   Bob,
 
 	I'm looking through some related PRs, and I happened to stumble upon
   this one. It's certainly been a while since this was originally filed.
   Are you still experiencing this problem? I haven't been able to reproduce
   it on 4.0-CURRENT as of today (using the shell script you had originally
   provided). I'm thinking that initially, the problem could have been
   related to tacking something onto the sb in a fashion where the involved
   code in soreceive doesn't necessarily "notice" it, due to it running at a
   given priority level, where it can potentially be interrupted. This is a
   real shot in the dark and shouldn't be considered if you're no longer
   experiencing this problem.
   	If, on the other hand, you are and are still willing to contribute to
   solving it, can you please acknowledge, and I would be interested in
   taking a shot at it as well.
 
   Later,
   Bosko.
   	
 --
   Bosko Milekic <bmilekic@technokratis.com>
 
 
 
 

From: Bosko Milekic <bmilekic@dsuper.net>
To: freebsd-gnats-submit@freebsd.org
Cc:  
Subject: Re: kern/10872: Panic in sorecieve() due to NULL mbuf pointer
Date: Wed, 8 Dec 1999 14:18:19 -0500 (EST)

 (C.C.ed to Bob already)
 
 !>Hi Bosko,
 !>
 !>I must confess that I wound up working around the problem by running
 !>2.2.8 on my amanda backup server.  This, of course, was only a temporary
 !>solution and I planned to get back to working on the problem, but I just
 !>never have gotten to it.  I would, however, be happy to help on solving
 !>this problem in any way that I can.  Though I haven't tried to recreate
 !>the problem for along time, I have a couple of systems here (one on
 !>4.0-CURRENT and another on 3.3-STABLE) that I could attempt it on.
 !>
 !>Bob
 !>
 
 	What would be interesting to find out is if you can still reproduce
   it on -CURRENT and 3.3-STABLE. This time, I would ask for you to try
   printing out the value of the mmbfree mbuf pointer as well as the mclfree
   mcluster pointer from the debugger. Also, if you ever end up crashing
   somewhere else (besides for in soreceive or sbdrop), e.g. somewhere
   directly from the if_fxp code, or if something has significantly changed
   from the first two traces that you provided, if you could get a new trace
   and post that along as well.
   	This problem could be related to either a bad cast somewhere along
   the line which can ultimately generate a 'garbage pointer' no longer
   referencing the mbuf or mbuf cluster free list, but something undefined,
   or, even bad offsetting into an mbuf cluster (probably also due to a bad
   cast somewhere along the line) which _may_ lead to the free mbuf or mbuf
   cluster pointer being NULL. If something gets screwed up along the way
   like this, chances are, it'll only be noticed after the following
   mcluster or mbuf (whichever it is that was screwed up) gets allocated --
   it could potentially be pointing to something that's not within mb_map at
   all!
 
   --Bosko  
 
 --
   Bosko Milekic <bmilekic@dsuper.net>
   
 
Date: Sun, 5 Dec 1999 20:09:07 -0500 (EST)
From: Bosko Milekic <bmilekic@dsuper.net>
To: freebsd-gnats-submit@freebsd.org, bob@pmr.com
Subject: Re: Panic in sorecieve() due to NULL mbuf pointer

   Bob,
 
 	I'm looking through some related PRs, and I happened to stumble upon
   this one. It's certainly been a while since this was originally filed.
   Are you still experiencing this problem? I haven't been able to reproduce
   it on 4.0-CURRENT as of today (using the shell script you had originally
   provided). I'm thinking that initially, the problem could have been
   related to tacking something onto the sb in a fashion where the involved
   code in soreceive doesn't necessarily "notice" it, due to it running at a
   given priority level, where it can potentially be interrupted. This is a
   real shot in the dark and shouldn't be considered if you're no longer
   experiencing this problem.
   	If, on the other hand, you are and are still willing to contribute to
   solving it, can you please acknowledge, and I would be interested in
   taking a shot at it as well.
 
   Later,
   Bosko.
   	
 --
   Bosko Milekic <bmilekic@technokratis.com>
 

From: Terry Kennedy <terry@tmk.com>
To: freebsd-gnats-submit@freebsd.org, bob@pmr.com
Cc:  
Subject: Re: kern/10872: Panic in soreceive() due to NULL mbuf pointer
Date: Tue, 21 Dec 1999 14:42:29 -0500 (EST)

 > What would be interesting to find out is if you can still reproduce
 > it on -CURRENT and 3.3-STABLE. This time, I would ask for you to try
 > printing out the value of the mmbfree mbuf pointer as well as the mclfree
 > mcluster pointer from the debugger. Also, if you ever end up crashing
 > somewhere else (besides for in soreceive or sbdrop), e.g. somewhere
 > directly from the if_fxp code, or if something has significantly changed
 > from the first two traces that you provided, if you could get a new trace
 > and post that along as well.
 
   I have a news server (Diablo) that moves about .3TB/day. After being down
 (due to an unrelated problem) for about 4 hours, I am getting these panics
 pretty much continuously (about 1 every 5 minutes) now. The FreeBSD version
 is 3.3-RELEASE from the WC CDROM.
 
   I'm using an NCR 875 controller and a DEC 21140 Ethernet chip on a full-
 duplex 100Mbit segment. NMBCLUSTERS is 32768 due to an earlier problem with
 running out of mbufs with maxusers=256.
 
   Most of the panics are the sbdrop panic, but some of them are a trap 12
 in tulip_rx_intr().
 
   Unfortunately, my kernel wasn't config'd with -g. I'm building a new one
 now and will report back with any additional info I come up with.
 
   If any of the developers would find access to the box to be helpful and
 can respond rapidly, I'd be glad to give them access. But once it gets
 caught up, it will likely stop crashing.
 
 	Terry Kennedy             http://www.tmk.com
         terry@tmk.com             Jersey City, NJ USA
         +1 201 451 4554 (voice)   +1 201 451 0900 (FAX)
 

From: Bob Willcox <bob@luke.immure.com>
To: Terry Kennedy <terry@tmk.com>
Cc: freebsd-gnats-submit@freebsd.org, bob@pmr.com
Subject: Re: kern/10872: Panic in soreceive() due to NULL mbuf pointer
Date: Wed, 5 Jan 2000 18:13:55 -0600

 I have recently put together a test system running 3.4-stable (as of
 1/4/2000) and am able to easily reproduce the sbdrop panic by running
 the little test script that I included in the bug report last summer.  I
 run the script on 4 other systems, targetting the test system.  Within a
 few minutes I consistently get the panic.  I have my system built with
 debug symbols and it dutifully creates a crash dump.  I haven't had much
 time to go any further than this with it right now, though (we are in
 the middle of renumbering our company's entire network and somehow I got
 stuck with planning and driving the effort:-().
 
 Bob
 
 On Tue, Dec 21, 1999 at 02:42:29PM -0500, Terry Kennedy wrote:
 > > What would be interesting to find out is if you can still reproduce
 > > it on -CURRENT and 3.3-STABLE. This time, I would ask for you to try
 > > printing out the value of the mmbfree mbuf pointer as well as the mclfree
 > > mcluster pointer from the debugger. Also, if you ever end up crashing
 > > somewhere else (besides for in soreceive or sbdrop), e.g. somewhere
 > > directly from the if_fxp code, or if something has significantly changed
 > > from the first two traces that you provided, if you could get a new trace
 > > and post that along as well.
 > 
 >   I have a news server (Diablo) that moves about .3TB/day. After being down
 > (due to an unrelated problem) for about 4 hours, I am getting these panics
 > pretty much continuously (about 1 every 5 minutes) now. The FreeBSD version
 > is 3.3-RELEASE from the WC CDROM.
 > 
 >   I'm using an NCR 875 controller and a DEC 21140 Ethernet chip on a full-
 > duplex 100Mbit segment. NMBCLUSTERS is 32768 due to an earlier problem with
 > running out of mbufs with maxusers=256.
 > 
 >   Most of the panics are the sbdrop panic, but some of them are a trap 12
 > in tulip_rx_intr().
 > 
 >   Unfortunately, my kernel wasn't config'd with -g. I'm building a new one
 > now and will report back with any additional info I come up with.
 > 
 >   If any of the developers would find access to the box to be helpful and
 > can respond rapidly, I'd be glad to give them access. But once it gets
 > caught up, it will likely stop crashing.
 > 
 > 	Terry Kennedy             http://www.tmk.com
 >         terry@tmk.com             Jersey City, NJ USA
 >         +1 201 451 4554 (voice)   +1 201 451 0900 (FAX)
 
 -- 
 Bob Willcox                 Don't tell me that worry doesn't do any good.
 bob@pmr.com                 I know better. The things I worry about don't
 Austin, TX                  happen.          -- Watchman Examiner
 

From: Bosko Milekic <bmilekic@dsuper.net>
To: terry@tmk.com, freebsd-gnats-submit@freebsd.org
Cc:  
Subject: Re: kern/10872: Panic in soreceive() due to NULL mbuf pointer
Date: Fri, 7 Jan 2000 01:20:29 -0500 (EST)

 >  Most of the panics are the sbdrop panic, but some of them are a trap 12
 >  in tulip_rx_intr().
   
 	Terry,
 
   Do you think you could provide a trace of particularily the crash that
   you're getting in tulip_rx_intr() ? This is the first time that I hear of
   the Tulip cards puking out in conjunction with the ncr and quite
   honestly, I'm really desperate trying to figure out what could cause the
   inconsistency between sb_cc and sb_mb. Perhaps this could shed some light
   on the issue as it would eliminate the sole blame on the fxp driver.
 
 
 --
  Bosko Milekic
  Email:  bmilekic@dsuper.net
  WWW:    http://pages.infinit.net/bmilekic/
 --
 
 
 

From: Terry Kennedy <terry@tmk.com>
To: Bosko Milekic <bmilekic@dsuper.net>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: kern/10872: Panic in soreceive() due to NULL mbuf pointer
Date: Fri, 07 Jan 2000 02:13:35 -0500 (EST)

 >   Do you think you could provide a trace of particularily the crash that
 >   you're getting in tulip_rx_intr() ? This is the first time that I hear of
 >   the Tulip cards puking out in conjunction with the ncr and quite
 >   honestly, I'm really desperate trying to figure out what could cause the
 >   inconsistency between sb_cc and sb_mb. Perhaps this could shed some light
 >   on the issue as it would eliminate the sole blame on the fxp driver.
 
   The system hasn't crashed since I rebuilt a kernel with debug symbols (-g)
 enabled. The box needs to be down for a few hours and then get swamped with
 packets, and if it's down for that long I get loads of unhappy users, so I
 can't do much testing. If it does fail again, I will collect crash dumps.
 
   I can say that the tulip_rx_intr() crash was in a MCLGET macro, and that
 a netstat -m on the various crash dumps all showed an apparently impossible
 mbuf usage - something like "1700/1600 mbufs in use" - however, in other
 *BSD's, I've found that the assorted *stat programs sometimes get confused
 between the running kernel values and the ones in the crash dump. Perhaps
 this is enough of a hint, though.
 
   I did a quick peek at the other *BSD's to see if there was anything obvious
 but the various versions have diverged enough that it wasn't useful. I did
 find references to sbdrop panics on other *BSD's, but in old versions - the
 PR's were closed with comments like "this code was changed a lot since the
 version it was reported against - if it happens on a current kernel, let us
 know".
 
 	Terry Kennedy             http://www.tmk.com
         terry@tmk.com             Jersey City, NJ USA
         +1 201 451 4554 (voice)   +1 201 451 0900 (FAX)
 

From: Bosko Milekic <bmilekic@dsuper.net>
To: Terry Kennedy <terry@tmk.com>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: kern/10872: Panic in soreceive() due to NULL mbuf pointer
Date: Fri, 7 Jan 2000 02:57:51 -0500 (EST)

 On Fri, 7 Jan 2000, Terry Kennedy wrote:
 
 >  I can say that the tulip_rx_intr() crash was in a MCLGET macro, and that
 >a netstat -m on the various crash dumps all showed an apparently impossible
 >mbuf usage - something like "1700/1600 mbufs in use" - however, in other
 >*BSD's, I've found that the assorted *stat programs sometimes get confused
 >between the running kernel values and the ones in the crash dump. Perhaps
 >this is enough of a hint, though.
 
 	Hmmm, judging from your original post, I thought that what you were
   receiving in tulip_rx_intr() was a page fault?
   	In any case, at the time this was occuring, `netstat -m' could
   display `peak' for mbuf _clusters_ greater than the actual `max.'
   However, this has been changed and now the limit of possible mclusters
   allocated from the pool is capped at exactly to what it was tuned for.
   	Considering all this, a fair assumption would be that the
   tulip_rx_intr() panic was a side-effect of an mbuf shortage. Hopefully,
   you will no longer obtain this particular panic.
 
   	As for the sbdrop() issue, that, I believe, is a whole other story. I
   think, at this point, that the problem is fairly isolated in the
   uipc_socket2.c code, and has not-so-much to do with the fxp driver, in
   particular, but that it occurs more as a `side-effect' of some
   timing-related issue. I'd prefer to keep searching until I figure it out
   before claiming anything, though. The biggest problem that I have at this
   end is that due to lack of hardware, I cannot reproduce it.
 
   Thanks for your input!
   Bosko.
 
 --
  Bosko Milekic
  Email:  bmilekic@dsuper.net
  WWW:    http://pages.infinit.net/bmilekic/
 --
 
 
 

From: Terry Kennedy <terry@tmk.com>
To: Bosko Milekic <bmilekic@dsuper.net>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: kern/10872: Panic in soreceive() due to NULL mbuf pointer
Date: Fri, 07 Jan 2000 03:06:33 -0500 (EST)

 > 	Hmmm, judging from your original post, I thought that what you were
 >   receiving in tulip_rx_intr() was a page fault?
 
   Yes. A trap 12 from a memory reference inside MCLGET.
 
 >   	In any case, at the time this was occuring, `netstat -m' could
 >   display `peak' for mbuf _clusters_ greater than the actual `max.'
 
   No, this was definitely mbufs, not mbuf clusters. As I said, this could
 be an error in the netstat code (the 4BSD code tends to use kerninfo even
 when it thinks it is looking at a corefile - I haven't looked at the cur-
 rent FreeBSD code to see if it's been fixed).
 
 >   	As for the sbdrop() issue, that, I believe, is a whole other story. I
 >   think, at this point, that the problem is fairly isolated in the
 >   uipc_socket2.c code, and has not-so-much to do with the fxp driver, in
 >   particular, but that it occurs more as a `side-effect' of some
 >   timing-related issue. I'd prefer to keep searching until I figure it out
 >   before claiming anything, though. The biggest problem that I have at this
 >   end is that due to lack of hardware, I cannot reproduce it.
 
   Ok. Someone else has appended to this PR that they have code that can
 trigger the bug pretty much at will on 3.4-RELEASE, so hopefully that will
 provide enough info.
 
   If nobody else has the hardware to test this, I can put up a test box if
 I get that code, and then I can make the box and the crash dumps available
 to folks. I just can't clobber my production system 8-)
 
 	Terry Kennedy             http://www.tmk.com
         terry@tmk.com             Jersey City, NJ USA
         +1 201 451 4554 (voice)   +1 201 451 0900 (FAX)
 

From: Terry Kennedy <terry@tmk.com>
To: freebsd-gnats-submit@freebsd.org
Cc: bmilekic@dsuper.net
Subject: Re: kern/10872: Panic in soreceive() due to NULL mbuf pointer
Date: Mon, 31 Jan 2000 06:43:05 -0500 (EST)

 Bosko Milekic writes:
 > Considering all this, a fair assumption would be that the
 > tulip_rx_intr() panic was a side-effect of an mbuf shortage. Hopefully,
 > you will no longer obtain this particular panic.
 
   Unfortunately, I'm still getting hit with these regularly under 3.4-
 RELEASE (installed from the 29-Dec-2000 ISO). Right now, the system is
 dying with these every 15 to 20 minutes (it's a news server that nor-
 mally moves about .3TB/day).
 
   It doesn't look like an mbuf shortage - here's a netstat of the entrails:
 
 (0:12) news:/var/crash# netstat -m -M vmcore.0 
 941/1216 mbufs in use:
         619 mbufs allocated to data
         322 mbufs allocated to packet headers
 519/708/32768 mbuf clusters in use (current/peak/max)
 1568 Kbytes allocated to network (73% in use)
 0 requests for memory denied
 0 requests for memory delayed
 0 calls to protocol drain routines
 
   of course, netstat -M is broken and misreports the active system, not the
 actual values in the crash dump:
 
 (0:13) news:/var/crash# netstat -m -M vmcore.0
 971/1216 mbufs in use:
         688 mbufs allocated to data
         283 mbufs allocated to packet headers
 580/740/32768 mbuf clusters in use (current/peak/max)
 1632 Kbytes allocated to network (78% in use)
 0 requests for memory denied
 0 requests for memory delayed
 0 calls to protocol drain routines
 
   Note that the values are different each time I re-run netstat, despite
 giving the -M option.
 
   Frankly, I don't understand how a bug marked as critical, with a clear
 method to reproduce it, can stay unresolved from 3.1 through 3.4. The old
 CSRG 2BSD docs mentioned sending a $100 bill wrapped around a bug report
 if you wanted immediate attention to your bug. I know that was a joke, but
 is there a similar method that works for FreeBSD? I'm getting desperate!
 
   Here's the gdb backtrace. I have the kernel (w/ symbols) and dumpfile
 if anyone wants them (note the dumpfile is 384MB). I can also provide
 access to the system (though it will likely be annoying since it keeps
 crashing).
 
 Script started on Mon Jan 31 06:38:17 2000
 (0:1) news:/var/crash# gdb -k kernel.0 vmcore.0	
 GNU gdb 4.18
 Copyright 1998 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type "show copying" to see the conditions.
 There is absolutely no warranty for GDB.  Type "show warranty" for details.
 This GDB was configured as "i386-unknown-freebsd"...
 IdlePTD 2928640
 initial pcb at 25a8d8
 panicstr: page fault
 panic messages:
 ---
 Fatal trap 12: page fault while in kernel mode
 fault virtual address	= 0x2b0600
 fault code		= supervisor read, page not present
 instruction pointer	= 0x8:0xc01c6187
 stack pointer	        = 0x10:0xc02417d4
 frame pointer	        = 0x10:0xc0241814
 code segment		= base 0x0, limit 0xfffff, type 0x1b
 			= DPL 0, pres 1, def32 1, gran 1
 processor eflags	= interrupt enabled, resume, IOPL = 0
 current process		= Idle
 interrupt mask		= net 
 trap number		= 12
 panic: page fault
 
 syncing disks... 
 
 Fatal trap 12: page fault while in kernel mode
 fault virtual address	= 0x30
 fault code		= supervisor read, page not present
 instruction pointer	= 0x8:0xc01d28e0
 stack pointer	        = 0x10:0xc024152c
 frame pointer	        = 0x10:0xc0241530
 code segment		= base 0x0, limit 0xfffff, type 0x1b
 			= DPL 0, pres 1, def32 1, gran 1
 processor eflags	= interrupt enabled, resume, IOPL = 0
 current process		= Idle
 interrupt mask		= net bio 
 trap number		= 12
 panic: page fault
 
 dumping to dev 20001, offset 262144
 dump 383 382 381 380 379 378 377 376 375 374 373 372 371 370 [...] 5 4 3 2 1
 ---
 #0  boot (howto=260) at ../../kern/kern_shutdown.c:285
 285			dumppcb.pcb_cr3 = rcr3();
 (kgdb) bt
 #0  boot (howto=260) at ../../kern/kern_shutdown.c:285
 #1  0xc01416c0 in at_shutdown (
     function=0xc023991a <__set_sysinit_set_sym_memdev_sys_init+1050>, arg=0x0, 
     queue=12) at ../../kern/kern_shutdown.c:446
 #2  0xc020c3d5 in trap_fatal (frame=0xc02414f0, eva=48)
     at ../../i386/i386/trap.c:942
 #3  0xc020c0b3 in trap_pfault (frame=0xc02414f0, usermode=0, eva=48)
     at ../../i386/i386/trap.c:835
 #4  0xc020bd2a in trap (frame={tf_es = 375717904, tf_ds = 16, 
       tf_edi = -972519680, tf_esi = 0, tf_ebp = -1071377104, 
       tf_isp = -1071377128, tf_ebx = -1071321596, tf_edx = -1073168320, 
       tf_ecx = 0, tf_eax = 0, tf_trapno = 12, tf_err = 0, 
       tf_eip = -1071830816, tf_cs = 8, tf_eflags = 66182, tf_esp = -834145536, 
       tf_ss = -1071377072}) at ../../i386/i386/trap.c:437
 #5  0xc01d28e0 in acquire_lock (lk=0xc024ee04)
     at ../../ufs/ffs/ffs_softdep.c:270
 #6  0xc01d565b in initiate_write_inodeblock (inodedep=0xc6088700, 
     bp=0xcdc47270) at ../../ufs/ffs/ffs_softdep.c:2788
 #7  0xc01d540b in softdep_disk_io_initiation (bp=0xcdc47270)
     at ../../ufs/ffs/ffs_softdep.c:2648
 #8  0xc0171bc6 in spec_strategy (ap=0xc02415b0)
     at ../../miscfs/specfs/spec_vnops.c:539
 #9  0xc0171359 in spec_vnoperate (ap=0xc02415b0)
     at ../../miscfs/specfs/spec_vnops.c:129
 #10 0xc01e1659 in ufs_vnoperatespec (ap=0xc02415b0)
     at ../../ufs/ufs/ufs_vnops.c:2318
 #11 0xc015eeff in bwrite (bp=0xcdc47270) at vnode_if.h:891
 #12 0xc0163502 in vop_stdbwrite (ap=0xc0241618) at ../../kern/vfs_default.c:296
 #13 0xc016334d in vop_defaultop (ap=0xc0241618) at ../../kern/vfs_default.c:130
 #14 0xc0171359 in spec_vnoperate (ap=0xc0241618)
     at ../../miscfs/specfs/spec_vnops.c:129
 #15 0xc01e1659 in ufs_vnoperatespec (ap=0xc0241618)
     at ../../ufs/ufs/ufs_vnops.c:2318
 #16 0xc015f8ab in vfs_bio_awrite (bp=0xcdc47270) at vnode_if.h:1145
 #17 0xc01dacda in ffs_fsync (ap=0xc02416a0) at ../../ufs/ffs/ffs_vnops.c:205
 #18 0xc01d9183 in ffs_sync (mp=0xc5ee4a00, waitfor=2, cred=0xc0e3a100, 
     p=0xc026f8f4) at vnode_if.h:499
 #19 0xc0167e27 in sync (p=0xc026f8f4, uap=0x0) at ../../kern/vfs_syscalls.c:549
 #20 0xc0141281 in boot (howto=256) at ../../kern/kern_shutdown.c:203
 #21 0xc01416c0 in at_shutdown (
     function=0xc023991a <__set_sysinit_set_sym_memdev_sys_init+1050>, arg=0x0, 
     queue=12) at ../../kern/kern_shutdown.c:446
 #22 0xc020c3d5 in trap_fatal (frame=0xc0241798, eva=2819584)
     at ../../i386/i386/trap.c:942
 #23 0xc020c0b3 in trap_pfault (frame=0xc0241798, usermode=0, eva=2819584)
     at ../../i386/i386/trap.c:835
 #24 0xc020bd2a in trap (frame={tf_es = -1071906800, tf_ds = -974913520, 
       tf_edi = -974858240, tf_esi = -1058292096, tf_ebp = -1071376364, 
       tf_isp = -1071376448, tf_ebx = -1056483072, tf_edx = 518358, 
       tf_ecx = -974857732, tf_eax = 2819584, tf_trapno = 12, tf_err = 0, 
       tf_eip = -1071881849, tf_cs = 8, tf_eflags = 66055, tf_esp = -974858240, 
       tf_ss = -60358592}) at ../../i386/i386/trap.c:437
 #25 0xc01c6187 in tulip_rx_intr (sc=0xc5e4d800) at ../../pci/if_de.c:3649
 #26 0xc01c67db in tulip_intr_handler (sc=0xc5e4d800, progress_p=0xc024183c)
     at ../../pci/if_de.c:3998
 #27 0xc01c6939 in tulip_intr_normal (arg=0xc5e4d800) at ../../pci/if_de.c:4187
 (kgdb) up 25
 #25 0xc01c6187 in tulip_rx_intr (sc=0xc5e4d800) at ../../pci/if_de.c:3649
 3649                        MCLGET(m0, M_DONTWAIT);
 (kgdb) list
 3644                MGETHDR(m0, M_DONTWAIT, MT_DATA);
 3645                if (m0 != NULL) {
 3646    #if defined(TULIP_COPY_RXDATA)
 3647                    if (!accept || total_len >= MHLEN) {
 3648    #endif
 3649                        MCLGET(m0, M_DONTWAIT);
 3650                        if ((m0->m_flags & M_EXT) == 0) {
 3651                            m_freem(m0);
 3652                            m0 = NULL;
 3653                        }
 (kgdb) quit
 (0:2) news:/var/crash exit
 Script done on Mon Jan 31 06:39:29 2000
 
 	Terry Kennedy             http://www.tmk.com
         terry@tmk.com             Jersey City, NJ USA
         +1 201 451 4554 (voice)   +1 201 451 0900 (FAX)
 

From: Terry Kennedy <terry@tmk.com>
To: freebsd-gnats-submit@freebsd.org
Cc: bmilekic@dsuper.net
Subject: Re: kern/10872: Panic in soreceive() due to NULL mbuf pointer
Date: Mon, 31 Jan 2000 08:35:05 -0500 (EST)

   Here's some more info on one of these crashes. From the attached, it
 looks like it's gone off into the weeds with a corrupted mbuf chain.
 All of the examine-able data which should be addresses is ASCII text
 instead... Or am I missing something?
 
 (kgdb) up 25
 #25 0xc01c6187 in tulip_rx_intr (sc=0xc5e4d800) at ../../pci/if_de.c:3649
 3649                        MCLGET(m0, M_DONTWAIT);
 (kgdb) list
 3644                MGETHDR(m0, M_DONTWAIT, MT_DATA);
 3645                if (m0 != NULL) {
 3646    #if defined(TULIP_COPY_RXDATA)
 3647                    if (!accept || total_len >= MHLEN) {
 3648    #endif
 3649                        MCLGET(m0, M_DONTWAIT);
 3650                        if ((m0->m_flags & M_EXT) == 0) {
 3651                            m_freem(m0);
 3652                            m0 = NULL;
 3653                        }
 (kgdb) print *m0
 $1 = {m_hdr = {mh_next = 0x4926235d, mh_nextpkt = 0x3f5c323b, 
     mh_data = 0x33225e3fcannot read proc at 0
 (kgdb) print m0
 $2 = (struct mbuf *) 0xc0060280
 (kgdb) printf "%s", m0
 ]#&I;2\??^"3?[Q#6@TC&`X&DR
 MQHX96+!O"Q?D'XQTQ?447PC\0VQI?"GTVKQ;T%^\0'TT`Q??O?XOZ`4?XP=4
 MC-G'9W9FC;(**%*BVA:4`I*J!K,OV'ZU>Y0]E)/%!HR)\1,K!)["=VQ9+RBY
 M$5%N6\U6C0DZ*+J(5?C>;47<4%^)_&.)O%WM_EM1#YVZAZK"YK;4@I-6)]T&
 M=AYT2-.I?E/+<\MR7*EZ1]T[;^SK_5%N[HQFB-L`J.!<HL)PFPQ4`FBL;__[
 [snip]
 
 (part of a uuencoded news article)
 
 	Terry Kennedy             http://www.tmk.com
         terry@tmk.com             Jersey City, NJ USA
         +1 201 451 4554 (voice)   +1 201 451 0900 (FAX)
 
State-Changed-From-To: open->feedback 
State-Changed-By: iedowse 
State-Changed-When: Sun Dec 2 11:06:25 PST 2001 
State-Changed-Why:  

There hasn't been any activity on this PR in some time now. Is the 
problem still reproducible on recent releases? 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=10872 
State-Changed-From-To: feedback->closed 
State-Changed-By: iedowse 
State-Changed-When: Mon Dec 3 09:41:39 PST 2001 
State-Changed-Why:  

Submitter says that he hasn't seen this problem since around 
3.2-RELEASE. 

http://www.FreeBSD.org/cgi/query-pr.cgi?pr=10872 
>Unformatted:
