From nobody@FreeBSD.org  Sat Mar 17 03:52:17 2012
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 6F912106566B
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 17 Mar 2012 03:52:17 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id 0BB7C8FC14
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 17 Mar 2012 03:52:17 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id q2H3qGW5007391
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 17 Mar 2012 03:52:16 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id q2H3qGNA007389;
	Sat, 17 Mar 2012 03:52:16 GMT
	(envelope-from nobody)
Message-Id: <201203170352.q2H3qGNA007389@red.freebsd.org>
Date: Sat, 17 Mar 2012 03:52:16 GMT
From: Zhouyi Zhou <zhouzhouyi@gmail.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: FB 8.0 freeze during the kernel dump
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         166193
>Category:       kern
>Synopsis:       [hang] FB 8.0 freeze during the kernel dump
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    avg
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Mar 17 04:00:23 UTC 2012
>Closed-Date:    Mon Jul 16 11:15:17 UTC 2012
>Last-Modified:  Mon Jul 16 11:15:17 UTC 2012
>Originator:     Zhouyi Zhou
>Release:        FB 8.0
>Organization:
www.ict.ac.cn
>Environment:
FreeBSD 8.0-RELEASE FreeBSD 8.0-RELEASE 
>Description:
FreeBSD 8.0 will freeze during the kernel panic memory dump. 
>How-To-Repeat:
Allocate a large memory, trigger a kernel panic, and let it dump
>Fix:
I use self developed  instrument code to prevent dead lock and find the cpu which is used for dumping is seized by ehci_interrupt which is locked up.

The fix is to add a critical_enter() in function doadump.


btw: following is my dirty and quick instrument code:
#define TRACELEVEL 5
void
lapic_handle_intr(int vector, struct trapframe *frame)
{
        struct intsrc *isrc;
#ifdef TRACELEVEL
        char tfip[20];
        if (1/*vector == 0x30*/){
          int i = 0;
          int j = 0;
          int cpuid = PCPU_GET(cpuid);
          struct amd64_frame * frame1;
          if (!INKERNEL(frame->tf_rip))
            goto out;
          frame1 = (struct amd64_frame *) (frame->tf_rbp);
          sprintf(tfip, "%x\n", frame->tf_rip);
          for (i = 0; i < 8; i++){
                  *(((unsigned char *)0xffffffff800b8000) + cpuid*300  + j*9*2 + i*2) = tfip[i];
                  if (*((unsigned char *)0xffffffff800b8001 + cpuid*300 + j*9*2 + i*2) != 121)
                          *((unsigned char *)0xffffffff800b8001 + cpuid*300 + j*9*2 + i*2) = 121;
                  else
                          *((unsigned char *)0xffffffff800b8001 + cpuid*300 + j*9*2 + i*2) = 120;
          }
          *(((unsigned char *)0xffffffff800b8000) + cpuid*300 + j*9*2 + 8*2) = ' ';
          *((unsigned char *)0xffffffff800b8001 + cpuid*300 + j*9*2 + 8*2) = 121;
          j = 1;
          while (j  <= TRACELEVEL){
                  if (!INKERNEL((long)frame1))
                          goto out;
                  sprintf(tfip, "%x\n", frame1->f_retaddr);
                  for (i = 0; i < 8; i++){
                          *(((unsigned char *)0xffffffff800b8000) +cpuid*300 + j*9*2 + i*2) = tfip[i];
                          if (*((unsigned char *)0xffffffff800b8001 + cpuid*300 +j*9*2 + i*2) != 121)
                                  *((unsigned char *)0xffffffff800b8001 + cpuid*300 +j*9*2 + i*2) = 121;
                          else
                                  *((unsigned char *)0xffffffff800b8001 + cpuid*300 +j*9*2 + i*2) = 120;
                  }
                  *(((unsigned char *)0xffffffff800b8000) + cpuid*300 +j*9*2 + 8*2) = ' ';
                  *((unsigned char *)0xffffffff800b8001 +cpuid*300 + j*9*2 + 8*2) = 121;
                  frame1 = frame1->f_frame;
                  j++;
          }

        }
        }
        out:
#endif

        if (vector == -1)
                panic("Couldn't get vector from ISR!");
        isrc = intr_lookup_source(apic_idt_to_irq(PCPU_GET(apic_id),
            vector));
        intr_execute_handlers(isrc, frame);
}



>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-fs 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Sat Mar 17 04:41:07 UTC 2012 
Responsible-Changed-Why:  
Over to maintainer(s).  Apparently the fix is simple (patch doadump). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=166193 
Responsible-Changed-From-To: freebsd-fs->avg 
Responsible-Changed-By: avg 
Responsible-Changed-When: Wed Mar 21 10:22:25 UTC 2012 
Responsible-Changed-Why:  
This PR looks like a duplicate of PR 139614. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=166193 

From: Andriy Gapon <avg@FreeBSD.org>
To: bug-followup@FreeBSD.org, zhouzhouyi@gmail.com
Cc:  
Subject: Re: kern/166193: [ufs] [hang] FB 8.0 freeze during the kernel dump
Date: Wed, 21 Mar 2012 12:28:26 +0200

 Do you have a patch? Have you tested it? Can you explain how it works?
 
 Can you also review the following PR
 http://www.freebsd.org/cgi/query-pr.cgi?pr=amd64/139614 and commits referenced
 by it? - Do those match your problem?
 
 -- 
 Andriy Gapon

From: Zhouyi Zhou <zhouzhouyi@gmail.com>
To: Andriy Gapon <avg@freebsd.org>
Cc: bug-followup@freebsd.org
Subject: Re: kern/166193: [ufs] [hang] FB 8.0 freeze during the kernel dump
Date: Fri, 23 Mar 2012 12:19:03 +0800

 Thanks for attention
 On Wed, Mar 21, 2012 at 6:28 PM, Andriy Gapon <avg@freebsd.org> wrote:
 > Do you have a patch?
 The following is the patch
 ---    kern_shutdown.c~
 +++ kern_shutdown.c
 @@ -242,6 +242,7 @@
         }
 
         savectx(&dumppcb);
 +       critical_enter();
         dumptid = curthread->td_tid;
         dumping++;
  #ifdef DDB
 @@ -263,6 +264,9 @@
         return (0);
  }
 >Have you tested it?
 test it many times, after my patch, dumping is no longer freeze
 >Can you explain how it works?
 1) how the patch works
 first let's assume the panic cpu id is 0, soon after cpu 0 is begin to dump,
 kernel scheduler preempt cpu 0 to execute other thread which soon
 locked up (usb subsystem).
 my patch is to prevent scheduler to preempt the dumping thread.
 2)how my instrument code works
 on each interrupt, print the current exeuction stack for each cpu in
 the system to vga memory
 3)what I find in dumping freeze scenery
 every time after the dumping freezes, the instrument code told me  the
 usb subsystem is locked up
 4)either my patch or disable usb in bios will prevent the dumping freeze
 >
 > Can you also review the following PR
 > http://www.freebsd.org/cgi/query-pr.cgi?pr=amd64/139614 and commits referenced
 > by it? - Do those match your problem?
 1)my PR is a sub problem of PR 139614, the commits settles the dumping
 freeze problem and dumping mistake problem
 altogether.
 2)I have reviewed current code before submit the PR and find the
 stopping other CPUs and scheduler treatment, and submit the PR for
 goodness of who do want patch heavily on their current work, and my
 patch don't settle the dumping mistake under heavy interrupt
 condition.
 >

From: Andriy Gapon <avg@FreeBSD.org>
To: Zhouyi Zhou <zhouzhouyi@gmail.com>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/166193: [ufs] [hang] FB 8.0 freeze during the kernel dump
Date: Thu, 24 May 2012 09:43:18 +0300

 on 23/03/2012 06:19 Zhouyi Zhou said the following:
 > Thanks for attention
 
 Sorry for taking so long to reply...
 
 > On Wed, Mar 21, 2012 at 6:28 PM, Andriy Gapon <avg@freebsd.org> wrote:
 >> Do you have a patch?
 > The following is the patch
 > ---    kern_shutdown.c~
 > +++ kern_shutdown.c
 > @@ -242,6 +242,7 @@
 >         }
 > 
 >         savectx(&dumppcb);
 > +       critical_enter();
 >         dumptid = curthread->td_tid;
 >         dumping++;
 >  #ifdef DDB
 > @@ -263,6 +264,9 @@
 >         return (0);
 >  }
 >> Have you tested it?
 > test it many times, after my patch, dumping is no longer freeze
 >> Can you explain how it works?
 > 1) how the patch works
 > first let's assume the panic cpu id is 0, soon after cpu 0 is begin to dump,
 > kernel scheduler preempt cpu 0 to execute other thread which soon
 > locked up (usb subsystem).
 > my patch is to prevent scheduler to preempt the dumping thread.
 
 OK.  Now I see what your patch does and I think that this is a good workaround.
 Although it won't help all the cases - e.g. if a thread running on a different
 CPU does memory-related operations then that can still confuse the dumping code.
 But at least the panic-ing/dumping CPU won't get indefinitely stuck.
 
 [snip]
 
 >> Can you also review the following PR
 >> http://www.freebsd.org/cgi/query-pr.cgi?pr=amd64/139614 and commits referenced
 >> by it? - Do those match your problem?
 > 1)my PR is a sub problem of PR 139614, the commits settles the dumping
 > freeze problem and dumping mistake problem
 > altogether.
 > 2)I have reviewed current code before submit the PR and find the
 > stopping other CPUs and scheduler treatment, and submit the PR for
 > goodness of who do want patch heavily on their current work, and my
 > patch don't settle the dumping mistake under heavy interrupt
 > condition.
 
 Yes, thank you for the investigation and the patch.
 And sorry for taking too long to act on your report.
 
 Now, I have MFCed the main part of the CPU/scheduler stopping commits to
 stable/8.  The new behavior is disabled by default, but could be enabled via a
 tunable.  In stable/9 the changes are fully MFCed and enabled bydefault.
 What is your opinion - should that be good enough or is your patch still needed?
 
 Assuming that you can try stable/8, could you please test if the latest code
 there is able to correctly handle your environment (hardware, interrupt load, etc)?
 
 Thank you!
 
 -- 
 Andriy Gapon
State-Changed-From-To: open->patched 
State-Changed-By: avg 
State-Changed-When: Thu Jun 7 08:12:59 UTC 2012 
State-Changed-Why:  
Update to the state of of PR 139614 

http://www.freebsd.org/cgi/query-pr.cgi?pr=166193 

From: Zhouyi Zhou <zhouzhouyi@gmail.com>
To: bug-followup@freebsd.org, avg@freebsd.org
Cc:  
Subject: Re: kern/166193: [hang] FB 8.0 freeze during the kernel dump
Date: Sat, 23 Jun 2012 11:20:43 +0800

 Andriy,
 
    I cvsuped the FB8 stable, and sysctl kern.stop_scheduler_on_panic=1
 Dumping is always successfully in 3 rounds of tries (2000M memory dump)).
   Many thanks
 Best Wishes
 Zhouyi
State-Changed-From-To: patched->closed 
State-Changed-By: avg 
State-Changed-When: Mon Jul 16 11:14:51 UTC 2012 
State-Changed-Why:  
See PR 139614. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=166193 
>Unformatted:
