From nobody@FreeBSD.org  Thu Aug 29 01:34:43 2013
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1])
	(using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by hub.freebsd.org (Postfix) with ESMTP id 3304BC2C
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 29 Aug 2013 01:34:43 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from oldred.freebsd.org (oldred.freebsd.org [8.8.178.121])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.freebsd.org (Postfix) with ESMTPS id 208CC274F
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 29 Aug 2013 01:34:43 +0000 (UTC)
Received: from oldred.freebsd.org ([127.0.1.6])
	by oldred.freebsd.org (8.14.5/8.14.7) with ESMTP id r7T1Ygvn011716
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 29 Aug 2013 01:34:42 GMT
	(envelope-from nobody@oldred.freebsd.org)
Received: (from nobody@localhost)
	by oldred.freebsd.org (8.14.5/8.14.5/Submit) id r7T1Ygoo011713;
	Thu, 29 Aug 2013 01:34:42 GMT
	(envelope-from nobody)
Message-Id: <201308290134.r7T1Ygoo011713@oldred.freebsd.org>
Date: Thu, 29 Aug 2013 01:34:42 GMT
From: Mike Harding <mvharding@gmail.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: 9.2-RC3 - on resume from suspend, disk operations are slower
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         181632
>Category:       kern
>Synopsis:       [suspend/resume] 9.2-RC3 - on resume from suspend, disk operations are slower
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Aug 29 01:40:00 UTC 2013
>Closed-Date:    
>Last-Modified:  Tue Apr 22 05:34:39 UTC 2014
>Originator:     Mike Harding
>Release:        9.2-RC3
>Organization:
>Environment:
FreeBSD bsd.mvh 9.2-RC3 FreeBSD 9.2-RC3 #0 r254986: Wed Aug 28 09:05:42 PDT 2013     root@bsd.mvh:/usr/obj/usr/src/sys/GENERIC  amd64
>Description:
I've been using 'zzz' with a WOL for a while to keep my machine available, but not drawing a lot of power.  I recently installed 9.2-RC3.  It takes about 47 minutes to do a 'make -j9 buildworld buildkernel' on this machine, but if I do a suspend and resume, it takes much longer to do this same operation after the machine comes back up - for example, it took 1 hour 37 minutes to do the same buildkernel/buildworld after the resume from suspend.

It looks like the 'rm' operations are happening especially slowly - there is not much disk or CPU activity going on when I see 'rm' on the terminal, vs. before the 'zzz', when the disk and/or CPU seem pretty busy most of the time during the build.

I know that suspend/resume is a bit flaky, but it's been pretty reliable for me under 9.1.
>How-To-Repeat:
as root
cd /usr/src
rm -rf /usr/obj
time make -j9 buildworld buildkernel
zzz
(wake up the box)
(repeat the above)
>Fix:


>Release-Note:
>Audit-Trail:

From: Mike Harding <mvharding@gmail.com>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/181632: 9.2-RC3 - on resume from suspend, disk operations
 are slower
Date: Thu, 29 Aug 2013 05:47:05 -0700

 --001a11c288a2dbe30504e5158099
 Content-Type: text/plain; charset=ISO-8859-1
 
 Also, I did a 'portmaster -a' this AM after resuming from suspend, and it
 was extremely slow, with 'systat' showing almost no CPU or disk activity.
 After a reboot, it completed quickly.
 
 --001a11c288a2dbe30504e5158099--

From: Mike Harding <mvharding@gmail.com>
To: bug-followup@FreeBSD.org, Mike Harding <mvharding@gmail.com>
Cc:  
Subject: Re: kern/181632: 9.2-RC3 - on resume from suspend, disk operations
 are slower
Date: Thu, 29 Aug 2013 23:51:18 -0700

 --001a11c2b5b250530004e524a680
 Content-Type: text/plain; charset=ISO-8859-1
 
 I was able to track this down by building kernels against /base/stable/9
 (it took
 -hours!-).
 
 The issue does occur with commit 244616, but does not occur with 244614.
 The
 only difference is a small patch to /usr/src/sys/dev/acpica/acpi_cpu.c -
 this
 code appears to do with c-state processing.
 
 --001a11c2b5b250530004e524a680--

From: Andriy Gapon <avg@FreeBSD.org>
To: bug-followup@FreeBSD.org, mvharding@gmail.com
Cc:  
Subject: Re: kern/181632: 9.2-RC3 - on resume from suspend, disk operations
 are slower
Date: Sat, 31 Aug 2013 17:03:17 +0300

 Hmm... I am very very surprised that that commit could have such consequences.
 Are you sure that it is it?
 
 As to the debugging I really don't know what to look for.
 Could you please describe your CPU?
 Perhaps use procstat -kk -a | fgrep -i acpi to check if there are any threads
 stuck somewhere in acpi after a resume.
 Maybe also compare output of sysctl hw.acpi and dev.cpu before and after a
 resume, with and without r244616.
 
 -- 
 Andriy Gapon

From: Mike Harding <mvharding@gmail.com>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/181632: 9.2-RC3 - on resume from suspend, disk operations
 are slower
Date: Sat, 31 Aug 2013 10:12:00 -0700

 --001a11c388b2efa98404e5416fa7
 Content-Type: text/plain; charset=ISO-8859-1
 
 I reverted the single line at
 
 http://svnweb.freebsd.org/base/stable/9/sys/dev/acpica/acpi_cpu.c?annotate=244616&pathrev=244616#l978
 
 to the previous version which was just
 
 ACPI_ENABLE_IRQS();
 
 instead of
 
 acpi_cpu_c1();
 
 and the problem does not occur.  The second line does 'sti; hlt' and the
 former just does 'sti'.
 
 Given that the code says
 
 /* If disabled, take the safe path. */  977 if (is_idle_disabled(sc)) {  978
 acpi_cpu_c1();  979 return;  980 }
 and then does a 'hlt' or idle if idle is disabled, this might be a problem.
 
 --001a11c388b2efa98404e5416fa7--

From: Andriy Gapon <avg@FreeBSD.org>
To: Mike Harding <mvharding@gmail.com>, bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/181632: 9.2-RC3 - on resume from suspend, disk operations
 are slower
Date: Sun, 01 Sep 2013 09:39:46 +0300

 on 31/08/2013 20:05 Mike Harding said the following:
 > I reverted the single line at
 > 
 > http://svnweb.freebsd.org/base/stable/9/sys/dev/acpica/acpi_cpu.c?annotate=244616&pathrev=244616#l978
 
 Thank you for narrowing this down!
 
 > Given that the code says
 > 
 > /* If disabled, take the safe path. */
 > 977 	if (is_idle_disabled(sc)) {
 > 978 	acpi_cpu_c1();
 > 979 	return;
 > 980 	}
 > 
 > 
 > and then does a 'hlt' or idle if idle is disabled, this might be a problem.
 
 But it should not be a problem.
 
 First, is_idle_disabled should not be normally set.  It is supposed to be set
 only during short transitional periods.  So it would be useful to understand why
 it is set (after resume) and makes the difference.
 
 Second, "idle is disabled" means that the ACPI C-state machine is disabled, not
 that the system must not idle at all.  In fact, being in this function
 (acpi_cpu_idle) means that the system explicitly wants to idle.  Using hlt to
 idle is the right / normal / safe thing.  Not using hlt means that the system
 keeps burning cycles even when it has nothing to do.  So with your change you
 should observe increased power consumption and heat production.
 So, this is another mystery as to why the perfectly normal use of hlt affects
 your system so badly.
 
 P.S. I hope that you noticed a list of things to look at in my previous followup
 to this PR.
 -- 
 Andriy Gapon

From: Mike Harding <mvharding@gmail.com>
To: Andriy Gapon <avg@freebsd.org>
Cc: bug-followup@freebsd.org
Subject: Re: kern/181632: 9.2-RC3 - on resume from suspend, disk operations
 are slower
Date: Sun, 1 Sep 2013 08:34:02 -0700

 --047d7bb0435c686c9204e5542fed
 Content-Type: text/plain; charset=ISO-8859-1
 
 Yes, I did notice the list of things to check.  Nothing significant is
 different
 in the cpu states (just temperature, etc.), no hcpi differences, and no
 processes returned by your code fragment.
 
 I have isolated my issue to this particular line of code, which has been
 changed
 (from my point of view) in 9.2-RC and different in 9.1.  It is 100%
 repeatable.
 No previous released version of FreeBSD has this change in it.  You
 introduced it, if it caused 'increased power consumption and heat
 production"
 with the previous version, it would have been noticed by now I think.
 As you point out, it runs rarely.  I only see this issue after a
 suspend/resume
 on a desktop machine, which is not something most people do, I am sure.
 
 The only place I see the disable idle calls in this file are
 acpi_cpu_shutdown,
 acpi_cpu_suspend, and acpi_cpu_startup.  It's possible that introducing
 a 'hlt' into the code path in these calls is causing a deadlock, race
 condition,
 or other issue that is not immediately obvious.  The only symptom I have is
 that disk operations are very slow, which is probably an interrupt issue or
 hardware suspend/resume issue.
 
 I am running a pretty generic system (Intel I5 750, Asus
 motherboard).  It is 100% repeatable, do you have a machine with multiple
 CPUs that you can try this with?  Do you have a machine with working
 suspend/resume?
 
 Given that the line in question appears to only run during startup,
 suspend/resume
 and shutdown, I think that reverting this -single line- to the version that
 has run
 on previous released versions of FreeBSD rather than a version that causes
 a
 demonstrated issue is prudent.  The only difference should be that a 'hlt'
 call will not occur during startup, shutdown or suspend, which are very
 short transitions.
 
 I am using an Intel i5 750 with this motherboard:
 http://www.asus.com/Motherboards/P7P55D/,
 which is running fairly common hardware.
 
 Is there anything else I can do?  I can give you ssh access to the system in
 question.
 
 --047d7bb0435c686c9204e5542fed--

From: Mike Harding <mvharding@gmail.com>
To: Andriy Gapon <avg@freebsd.org>
Cc: bug-followup@freebsd.org
Subject: Re: kern/181632: 9.2-RC3 - on resume from suspend, disk operations
 are slower
Date: Sun, 1 Sep 2013 13:06:40 -0700

 --047d7b624dfe6cf35604e557fe2a
 Content-Type: text/plain; charset=ISO-8859-1
 
 I believe that code that put the ACPI_ENABLE_IRQS() in was in this commit:
 
 http://svnweb.freebsd.org/base/head/sys/dev/acpica/acpi_cpu.c?annotate=122766
 
 The author made a distinction between using ACPI_ENABLE_IRQS() and
 acpi_cpu_c1() in this code, both are used in different parts of the code,
 added with
 the same commit.  If the distinctions are not necessary, I don't know why
 the author
 would make them.
 
 --047d7b624dfe6cf35604e557fe2a--

From: Andriy Gapon <avg@FreeBSD.org>
To: Mike Harding <mvharding@gmail.com>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/181632: 9.2-RC3 - on resume from suspend, disk operations
 are slower
Date: Mon, 02 Sep 2013 00:52:50 +0300

 on 01/09/2013 18:34 Mike Harding said the following:
 > Yes, I did notice the list of things to check.  Nothing significant is different
 > in the cpu states (just temperature, etc.), no hcpi differences, and no
 > processes returned by your code fragment.
 
 I guess I have to trust you on this.  Isn't it simpler to just paste a few lines?
 
 > I have isolated my issue to this particular line of code, which has been changed
 > (from my point of view) in 9.2-RC and different in 9.1.  It is 100% repeatable.
 > No previous released version of FreeBSD has this change in it.
 
 I really do appreciate the investigative work that you did.
 
 > You
 > introduced it
 
 Yes, I did, without any doubt.
 
 > if it caused 'increased power consumption and heat production"
 > with the previous version, it would have been noticed by now I think.
 
 Please do not forget that 9.2-RC has other changes that no previous release of
 FreeBSD had.  Just because reverting that particular line improves things for
 you does not automatically mean that that line is to blame.  It could be that it
 just reveals a problem in an earlier commit.
 
 > As you point out, it runs rarely.
 
 My actual point was that it should not make any difference at all in a normal
 system state.  Unless the "disabled" flag got stuck somehow.
 
 > I only see this issue after a suspend/resume
 > on a desktop machine, which is not something most people do, I am sure.
 
 But some do.  Including me.
 
 > The only place I see the disable idle calls in this file are acpi_cpu_shutdown,
 > acpi_cpu_suspend, and acpi_cpu_startup.  It's possible that introducing
 > a 'hlt' into the code path in these calls is causing a deadlock, race condition,
 > or other issue that is not immediately obvious.
 
 I think that you misunderstood the code.  The disable calls only set flag.  hlt
 instruction is executed in acpi_cpu_idle.
 
 > The only symptom I have is
 > that disk operations are very slow, which is probably an interrupt issue or
 > hardware suspend/resume issue.
 > 
 > I am running a pretty generic system (Intel I5 750, Asus
 > motherboard).  It is 100% repeatable, do you have a machine with multiple
 > CPUs that you can try this with?  Do you have a machine with working
 > suspend/resume?
 
 Yes on both accounts.
 
 > Given that the line in question appears to only run during startup, suspend/resume
 > and shutdown, I think that reverting this -single line- to the version that has run
 > on previous released versions of FreeBSD rather than a version that causes a
 > demonstrated issue is prudent.  The only difference should be that a 'hlt'
 > call will not occur during startup, shutdown or suspend, which are very
 > short transitions.
 
 Finding the root cause would be really prudent.
 
 > I am using an Intel i5 750 with this motherboard:
 > http://www.asus.com/Motherboards/P7P55D/,
 > which is running fairly common hardware.
 > 
 > Is there anything else I can do?  I can give you ssh access to the system in
 > question.
 
 That would be perfect.
 
 P.S.
 Thank you again for finding the commit and the line.
 I understand that you've already made conclusions based on your findings and you
 would like to see a quick action based on your findings.
 But I would like to determine a root cause and have a clear explanation of what
 exactly causes the slowdown that you see.
 So I want to ask you to try to help me as much as you can instead of trying to
 persuade me to just revert the line in question.
 
 
 
 -- 
 Andriy Gapon
>Unformatted:
