From kargl@troutmask.apl.washington.edu  Thu Feb  2 17:28:50 2006
Return-Path: <kargl@troutmask.apl.washington.edu>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C3C7F16A420
	for <FreeBSD-gnats-submit@freebsd.org>; Thu,  2 Feb 2006 17:28:50 +0000 (GMT)
	(envelope-from kargl@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.208.78.105])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 783E443D46
	for <FreeBSD-gnats-submit@freebsd.org>; Thu,  2 Feb 2006 17:28:50 +0000 (GMT)
	(envelope-from kargl@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
	by troutmask.apl.washington.edu (8.13.4/8.13.4) with ESMTP id k12HSoni059432
	for <FreeBSD-gnats-submit@freebsd.org>; Thu, 2 Feb 2006 09:28:50 -0800 (PST)
	(envelope-from kargl@troutmask.apl.washington.edu)
Received: (from kargl@localhost)
	by troutmask.apl.washington.edu (8.13.4/8.13.1/Submit) id k12HSoVi059431;
	Thu, 2 Feb 2006 09:28:50 -0800 (PST)
	(envelope-from kargl)
Message-Id: <200602021728.k12HSoVi059431@troutmask.apl.washington.edu>
Date: Thu, 2 Feb 2006 09:28:50 -0800 (PST)
From: "Steven G. Kargl" <kargl@troutmask.apl.washington.edu>
Reply-To: "Steven G. Kargl" <kargl@troutmask.apl.washington.edu>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: New pts code causes AMD64 panics
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         92742
>Category:       kern
>Synopsis:       [pts] [panic] New pts code causes AMD64 panics (regression)
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    cognet
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Feb 02 17:30:03 GMT 2006
>Closed-Date:    Tue Jul 10 15:47:54 GMT 2007
>Last-Modified:  Tue Jul 10 15:47:54 GMT 2007
>Originator:     Steven G. Kargl
>Release:        FreeBSD 7.0-CURRENT amd64
>Organization:
apl/uw
>Environment:
System: FreeBSD troutmask.apl.washington.edu 7.0-CURRENT FreeBSD 7.0-CURRENT #3: Tue Jan 31 15:39:02 PST 2006 root@troutmask.apl.washington.edu:/usr/obj/usr/src/sys/SPEW amd64


	
>Description:

After a binary search, I have determined that the new pts code is
triggering kernel panics on an AMD64 system. 

Using this supfile file, I retrieve the src/sys

*default host=cvsup10.freebsd.org
*default base=/usr
*default release=cvs tag=.
*default delete use-rel-suffix
*default prefix=/usr
#*default date=2006.01.26.01.30.00  <-- Good working kernel
*default date=2006.01.26.01.31.00   <-- kernel dies within 5 to 10 minutes.
src-sys

The difference in the src/sys between the above time stamps are
Updating collection src-sys/cvs
 Edit src/sys/conf/files
 Checkout src/sys/kern/tty_pts.c
 Edit src/sys/kern/tty_pty.c
 Edit src/sys/sys/ttycom.h

My kernel is UP on a dual processor Tyan K8S Pro motherboard with 12 GB
of memory.  I have no loaded modules.  I have neither MEMGUARD or REDZONES
compiled into the kernel.  Attempts to use MEMGUARD results in a kernel
that does not make it to single user mode.

With vm.old_contigmalloc=1

Memory modified after free 0xfffffff024e38f200(504) val = deadc0dd @ 0xfffffff024e38f2d0
panic: Most recently used by DEVFS1

KDB: stack backtrace:
panic() at panic+0x1c1
mtrash_ctor() at mtrash_ctor+0x78
uma_zalloc_arg() at uma_zalloc_arg+0x306
malloc() at malloc+0x3a
fdinit() at fdinit+0x24
fdcopy() at fdcopy+0x24
fork1() at fork1+0x6df
vfork() at vfork+0x1c
syscall() at syscall+0x517
Xfast_syscall() at Xfast_syscall+0xa8
--- syscall (66, FreeBSD ELF64, vfork) rip = 0x2006a5b4d, rsp=0xfffffffda50, rbp = 0 ---


With vm.old_contigmalloc=0

Script started on Wed Feb  1 15:32:43 2006
troutmask:root[201] kgdb /boot/kernel/kernel vmcore.0 
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd".

Unread portion of the kernel message buffer:
Memory modified after free 0xffffff0254d62600(504) val=deadc0dd @ 0xffffff0254d626d0
panic: Most recently used by DEVFS1

KDB: stack backtrace:
panic() at panic+0x1c1
mtrash_ctor() at mtrash_ctor+0x78
uma_zalloc_arg() at uma_zalloc_arg+0x306
malloc() at malloc+0xa3
devfs_alloc() at devfs_alloc+0x1a
make_dev_credv() at make_dev_credv+0x4b
make_dev_cred() at make_dev_cred+0x8e
ptcopen() at ptcopen+0x111
giant_open() at giant_open+0x5f
devfs_open() at devfs_open+0x23b
VOP_OPEN_APV() at VOP_OPEN_APV+0x74
vn_open_cred() at vn_open_cred+0x38c
kern_open() at kern_open+0xfd
open() at open+0x25
syscall() at syscall+0x517
Xfast_syscall() at Xfast_syscall+0xa8
--- syscall (5, FreeBSD ELF64, open), rip = 0x200aeebcc, rsp = 0x7fffffff2e58, rbp = 0xffffffff ---

KDB: enter: panic
Uptime: 6m10s
Dumping 12223 MB (3 chunks)
  chunk 0: 1MB (159 pages) ... ok
  chunk 1: 4031MB (1031920 pages) ... ok
  chunk 2: 8192MB (2097152 pages) 

#0  doadump () at pcpu.h:172
172	pcpu.h: No such file or directory.
	in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:172
#1  0xffffffff8027f809 in boot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff8027f2da in panic (
    fmt=0xffffffff80476e34 "Most recently used by %s\n")
    at /usr/src/sys/kern/kern_shutdown.c:555
#3  0xffffffff803b9ad8 in mtrash_ctor (mem=0x0, size=0, arg=0x0, flags=0)
    at /usr/src/sys/vm/uma_dbg.c:137
#4  0xffffffff803b8046 in uma_zalloc_arg (zone=0xffffff02fffeae40, udata=0x0, 
    flags=1282) at /usr/src/sys/vm/uma_core.c:1846
#5  0xffffffff80273d93 in malloc (size=15, mtp=0xffffffff805aac60, flags=1282)
    at uma.h:275
#6  0xffffffff80228dca in devfs_alloc ()
    at /usr/src/sys/fs/devfs/devfs_devs.c:121
#7  0xffffffff80254d1b in make_dev_credv (devsw=0xffffffff805c0e40, 
    minornr=0, cr=0xffffff0250378380, uid=0, gid=0, mode=438, 
    fmt=0xffffffff80462900 "tty%c%r", ap=0xffffffffbd5e2530)
    at /usr/src/sys/kern/kern_conf.c:523
#8  0xffffffff80254ebe in make_dev_cred (devsw=0x0, minornr=0, cr=0x0, uid=0, 
    gid=0, mode=0, fmt=0x0) at /usr/src/sys/kern/kern_conf.c:581
#9  0xffffffff802c0ce1 in ptcopen (dev=0x0, flag=0, devtype=0, 
    td=0xffffff0250378380) at /usr/src/sys/kern/tty_pty.c:163
#10 0xffffffff80253caf in giant_open (dev=0xffffff024d8fc400, oflags=32771, 
    devtype=8192, td=0xffffff024fcc5000) at /usr/src/sys/kern/kern_conf.c:242
#11 0xffffffff8022bcdb in devfs_open (ap=0xffffffffbd5e2770)
    at /usr/src/sys/fs/devfs/devfs_vnops.c:680
#12 0xffffffff8042b3f4 in VOP_OPEN_APV (vop=0x0, a=0xffffffffbd5e2770)
    at vnode_if.c:365
#13 0xffffffff802f855c in vn_open_cred (ndp=0xffffffffbd5e2990, 
    flagp=0xffffffffbd5e28dc, cmode=8, cred=0xffffff0250378380, fdidx=6)
    at vnode_if.h:198
#14 0xffffffff802ee83d in kern_open (td=0xffffff024fcc5000, 
    path=0x519fab <Address 0x519fab out of bounds>, pathseg=UIO_USERSPACE, 
    flags=32771, mode=-1117902448) at /usr/src/sys/kern/vfs_syscalls.c:977
#15 0xffffffff802eef35 in open (td=0x0, uap=0xffffffffbd5e2c00)
    at /usr/src/sys/kern/vfs_syscalls.c:943
#16 0xffffffff803ea0e7 in syscall (frame=
      {tf_rdi = 5349291, tf_rsi = 32770, tf_rdx = 10, tf_rcx = 8601451180, tf_r8 = -2142762872, tf_r9 = 140737488301656, tf_rax = 5, tf_rbx = 0, tf_rbp = 4294967295, tf_r10 = 1, tf_r11 = 514, tf_r12 = 6, tf_r13 = 5349291, tf_r14 = 5349280, tf_r15 = 1, tf_trapno = 22, tf_addr = 0, tf_flags = 0, tf_err = 2, tf_rip = 8601398220, tf_cs = 43, tf_rflags = 582, tf_rsp = 140737488301656, tf_ss = 35}) at /usr/src/sys/amd64/amd64/trap.c:821
#17 0xffffffff803d8048 in Xfast_syscall ()
    at /usr/src/sys/amd64/amd64/exception.S:270
#18 0x0000000200aeebcc in ?? ()
 
>How-To-Repeat:
	Try booting a post 2006.01.26.31.00.00 kernel

>Fix:

Remove new pts code?
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->cognet 
Responsible-Changed-By: glebius 
Responsible-Changed-When: Thu Feb 2 18:34:21 UTC 2006 
Responsible-Changed-Why:  
Olivier committed pts. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=92742 

From: Giorgos Keramidas <keramida@freebsd.org>
To: "Steven G. Kargl" <kargl@troutmask.apl.washington.edu>
Cc: bug-followup@freebsd.org
Subject: Re: kern/92742: New pts code causes AMD64 panics
Date: Sun, 12 Feb 2006 04:05:21 +0200

 Adding what I know so far about this to the audit trail...
 
 On 2006-02-02 09:28, "Steven G. Kargl" <kargl@troutmask.apl.washington.edu> wrote:
 > Responsible-Changed-From-To: freebsd-bugs->cognet
 > Responsible-Changed-By: glebius
 > Responsible-Changed-When: Thu Feb 2 18:34:21 UTC 2006
 > Responsible-Changed-Why:
 > Olivier committed pts.
 >
 > http://www.freebsd.org/cgi/query-pr.cgi?pr=92742
 
 I have a few doubts that this is a bug in pts.  I haven't been able to
 run a kernel from HEAD for a very long time after 2006.01.25.00.00.00.
 
 It panics with a message about a non-sleepable lock held in a thread
 that can sleep, but I can't easily reproduce the panics and get a crash
 dump yet.
 
 Also, running a kernel from the last date mentioned by Steven as safe,
 panics here with the same sx lock message.
 
 I'm building a world from 2006.01.20.00.45.00 now, going a few days
 back, but there have been too many changes since then.  I'll try to
 reproduce the sx lock panics and narrow down the commit between
 2006.01.20.00.45.00 and 2006.01.25.00.00.00 that these panics start to
 happen though...
 
 The important effects that seem related to pts, but I can't tell if they
 are caused by teh new pts code, are a syscons that is unusable on my
 Ferrari 3400 laptop.  A kernel from before the pts code runs fine (until
 it panics from the sx lock stuff).  A kernel with the pts code runs fine
 in single user mode, but when multiuser mode starts the console
 terminals don't echo characters as I type them.  Repeatedly hitting
 Scroll-Lock makes the typed characters visible after a while, but not
 reliably.
 
 I downloaded the SNAP012 snapshot from ftp.freebsd.org and installed
 that on a spare partition on the same laptop.  It also fails to work in
 multiuser mode, so this doesn't look like a local configuration problem :-/
 
State-Changed-From-To: open->closed 
State-Changed-By: linimon 
State-Changed-When: Tue Jul 10 15:47:01 UTC 2007 
State-Changed-Why:  
Closed at submitter's request. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=92742 
>Unformatted:
