From nobody@FreeBSD.org  Sat May 16 18:56:16 2009
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4646D106564A
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 16 May 2009 18:56:16 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 35D468FC1B
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 16 May 2009 18:56:16 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id n4GIuF7v043762
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 16 May 2009 18:56:15 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id n4GIuFLO043761;
	Sat, 16 May 2009 18:56:15 GMT
	(envelope-from nobody)
Message-Id: <200905161856.n4GIuFLO043761@www.freebsd.org>
Date: Sat, 16 May 2009 18:56:15 GMT
From: Andi Kleen <andi-fbsd@firstfloor.org>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Incorrect machine check exception handler test
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         134586
>Category:       i386
>Synopsis:       [i386] [patch] Incorrect machine check exception handler test
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    jhb
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat May 16 19:00:09 UTC 2009
>Closed-Date:    Mon Oct 11 19:05:16 UTC 2010
>Last-Modified:  Mon Oct 11 19:05:16 UTC 2010
>Originator:     Andi Kleen
>Release:        code review of HEAD 090516
>Organization:
Intel OTC
>Environment:
>Description:
Obvious bug found during code reading of the x86 machine check handler.

Machine check exceptions don't check for UC and PCC because the OVER
check always overwrites the check mask.  Obviously the OVER assignment
should be a or.

See the attached patch for a fix.

I think there are more problems, but that seems to be the most serious
one.

>How-To-Repeat:
Trigger a uncorrected memory error (e.g. hair dryer on DIMMs on a system with ECC memory) See if the system panics.


>Fix:
Apply patch.


Patch attached with submission follows:

Index: i386/i386/mca.c
===================================================================
--- i386/i386/mca.c	(revision 192202)
+++ i386/i386/mca.c	(working copy)
@@ -346,7 +346,7 @@
 
 	/* When handling a MCE#, treat the OVER flag as non-restartable. */
 	if (mcip)
-		ucmask = MC_STATUS_OVER;
+		ucmask |= MC_STATUS_OVER;
 	mcg_cap = rdmsr(MSR_MCG_CAP);
 	for (i = 0; i < (mcg_cap & MCG_CAP_COUNT); i++) {
 		rec = mca_record_entry(i);


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->jhb 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Mon May 18 01:42:06 UTC 2009 
Responsible-Changed-Why:  
Over to committer of file in question. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=134586 

From: John Baldwin <jhb@FreeBSD.org>
To: bug-followup@freebsd.org,
 andi-fbsd@firstfloor.org
Cc:  
Subject: Re: i386/134586: [i386] [patch] Incorrect machine check exception handler test
Date: Mon, 18 May 2009 12:06:02 -0400

 I actually did check this by doing 'dd if=/dev/mem of=/dev/null' with known 
 bad RAM and was able to get panics, but mine must have had the OVER flag set.  
 I'm curious as to the other problems that you think are present?
 
 -- 
 John Baldwin

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: i386/134586: commit references a PR
Date: Mon, 18 May 2009 21:50:21 +0000 (UTC)

 Author: jhb
 Date: Mon May 18 21:50:06 2009
 New Revision: 192343
 URL: http://svn.freebsd.org/changeset/base/192343
 
 Log:
   - Add a tunable 'hw.mca.enabled' that can be used to enable/disable the
     machine check code.  Disable it by default for now.
   - When computing the mask of bits that determines a non-restartable event
     during a machine check exception, or-in the overflow flag rather than
     replacing the other flags.
   
   PR:		i386/134586 [2]
   Submitted by:	Andi Kleen  andi-fbsd firstfloor.org
 
 Modified:
   head/sys/amd64/amd64/mca.c
   head/sys/i386/i386/mca.c
 
 Modified: head/sys/amd64/amd64/mca.c
 ==============================================================================
 --- head/sys/amd64/amd64/mca.c	Mon May 18 21:47:32 2009	(r192342)
 +++ head/sys/amd64/amd64/mca.c	Mon May 18 21:50:06 2009	(r192343)
 @@ -55,10 +55,15 @@ struct mca_internal {
  
  static MALLOC_DEFINE(M_MCA, "MCA", "Machine Check Architecture");
  
 -static struct sysctl_oid *mca_sysctl_tree;
 -
  static int mca_count;		/* Number of records stored. */
  
 +SYSCTL_NODE(_hw, OID_AUTO, mca, CTLFLAG_RD, NULL, "Machine Check Architecture");
 +
 +static int mca_enabled = 0;
 +TUNABLE_INT("hw.mca.enabled", &mca_enabled);
 +SYSCTL_INT(_hw_mca, OID_AUTO, enabled, CTLFLAG_RDTUN, &mca_enabled, 0,
 +    "Administrative toggle for machine check support");
 +
  static STAILQ_HEAD(, mca_internal) mca_records;
  static struct callout mca_timer;
  static int mca_ticks = 3600;	/* Check hourly by default. */
 @@ -346,7 +351,7 @@ mca_scan(int mcip)
  
  	/* When handling a MCE#, treat the OVER flag as non-restartable. */
  	if (mcip)
 -		ucmask = MC_STATUS_OVER;
 +		ucmask |= MC_STATUS_OVER;
  	mcg_cap = rdmsr(MSR_MCG_CAP);
  	for (i = 0; i < (mcg_cap & MCG_CAP_COUNT); i++) {
  		rec = mca_record_entry(i);
 @@ -426,7 +431,7 @@ static void
  mca_startup(void *dummy)
  {
  
 -	if (!(cpu_feature & CPUID_MCA))
 +	if (!mca_enabled || !(cpu_feature & CPUID_MCA))
  		return;
  
  	callout_reset(&mca_timer, mca_ticks * hz, mca_periodic_scan,
 @@ -442,17 +447,15 @@ mca_setup(void)
  	STAILQ_INIT(&mca_records);
  	TASK_INIT(&mca_task, 0x8000, mca_scan_cpus, NULL);
  	callout_init(&mca_timer, CALLOUT_MPSAFE);
 -	mca_sysctl_tree = SYSCTL_ADD_NODE(NULL, SYSCTL_STATIC_CHILDREN(_hw),
 -	    OID_AUTO, "mca", CTLFLAG_RW, NULL, "MCA container");
 -	SYSCTL_ADD_INT(NULL, SYSCTL_CHILDREN(mca_sysctl_tree), OID_AUTO,
 +	SYSCTL_ADD_INT(NULL, SYSCTL_STATIC_CHILDREN(_hw_mca), OID_AUTO,
  	    "count", CTLFLAG_RD, &mca_count, 0, "Record count");
 -	SYSCTL_ADD_PROC(NULL, SYSCTL_CHILDREN(mca_sysctl_tree), OID_AUTO,
 +	SYSCTL_ADD_PROC(NULL, SYSCTL_STATIC_CHILDREN(_hw_mca), OID_AUTO,
  	    "interval", CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, &mca_ticks,
  	    0, sysctl_mca_ticks, "I",
  	    "Periodic interval in seconds to scan for machine checks");
 -	SYSCTL_ADD_NODE(NULL, SYSCTL_CHILDREN(mca_sysctl_tree), OID_AUTO,
 +	SYSCTL_ADD_NODE(NULL, SYSCTL_STATIC_CHILDREN(_hw_mca), OID_AUTO,
  	    "records", CTLFLAG_RD, sysctl_mca_records, "Machine check records");
 -	SYSCTL_ADD_PROC(NULL, SYSCTL_CHILDREN(mca_sysctl_tree), OID_AUTO,
 +	SYSCTL_ADD_PROC(NULL, SYSCTL_STATIC_CHILDREN(_hw_mca), OID_AUTO,
  	    "force_scan", CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, NULL, 0,
  	    sysctl_mca_scan, "I", "Force an immediate scan for machine checks");
  }
 @@ -465,7 +468,7 @@ mca_init(void)
  	int i;
  
  	/* MCE is required. */
 -	if (!(cpu_feature & CPUID_MCE))
 +	if (!mca_enabled || !(cpu_feature & CPUID_MCE))
  		return;
  
  	if (cpu_feature & CPUID_MCA) {
 
 Modified: head/sys/i386/i386/mca.c
 ==============================================================================
 --- head/sys/i386/i386/mca.c	Mon May 18 21:47:32 2009	(r192342)
 +++ head/sys/i386/i386/mca.c	Mon May 18 21:50:06 2009	(r192343)
 @@ -55,10 +55,15 @@ struct mca_internal {
  
  static MALLOC_DEFINE(M_MCA, "MCA", "Machine Check Architecture");
  
 -static struct sysctl_oid *mca_sysctl_tree;
 -
  static int mca_count;		/* Number of records stored. */
  
 +SYSCTL_NODE(_hw, OID_AUTO, mca, CTLFLAG_RD, NULL, "Machine Check Architecture");
 +
 +static int mca_enabled = 0;
 +TUNABLE_INT("hw.mca.enabled", &mca_enabled);
 +SYSCTL_INT(_hw_mca, OID_AUTO, enabled, CTLFLAG_RDTUN, &mca_enabled, 0,
 +    "Administrative toggle for machine check support");
 +
  static STAILQ_HEAD(, mca_internal) mca_records;
  static struct callout mca_timer;
  static int mca_ticks = 3600;	/* Check hourly by default. */
 @@ -346,7 +351,7 @@ mca_scan(int mcip)
  
  	/* When handling a MCE#, treat the OVER flag as non-restartable. */
  	if (mcip)
 -		ucmask = MC_STATUS_OVER;
 +		ucmask |= MC_STATUS_OVER;
  	mcg_cap = rdmsr(MSR_MCG_CAP);
  	for (i = 0; i < (mcg_cap & MCG_CAP_COUNT); i++) {
  		rec = mca_record_entry(i);
 @@ -426,7 +431,7 @@ static void
  mca_startup(void *dummy)
  {
  
 -	if (!(cpu_feature & CPUID_MCA))
 +	if (!mca_enabled || !(cpu_feature & CPUID_MCA))
  		return;
  
  	callout_reset(&mca_timer, mca_ticks * hz, mca_periodic_scan,
 @@ -442,17 +447,15 @@ mca_setup(void)
  	STAILQ_INIT(&mca_records);
  	TASK_INIT(&mca_task, 0x8000, mca_scan_cpus, NULL);
  	callout_init(&mca_timer, CALLOUT_MPSAFE);
 -	mca_sysctl_tree = SYSCTL_ADD_NODE(NULL, SYSCTL_STATIC_CHILDREN(_hw),
 -	    OID_AUTO, "mca", CTLFLAG_RW, NULL, "MCA container");
 -	SYSCTL_ADD_INT(NULL, SYSCTL_CHILDREN(mca_sysctl_tree), OID_AUTO,
 +	SYSCTL_ADD_INT(NULL, SYSCTL_STATIC_CHILDREN(_hw_mca), OID_AUTO,
  	    "count", CTLFLAG_RD, &mca_count, 0, "Record count");
 -	SYSCTL_ADD_PROC(NULL, SYSCTL_CHILDREN(mca_sysctl_tree), OID_AUTO,
 +	SYSCTL_ADD_PROC(NULL, SYSCTL_STATIC_CHILDREN(_hw_mca), OID_AUTO,
  	    "interval", CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, &mca_ticks,
  	    0, sysctl_mca_ticks, "I",
  	    "Periodic interval in seconds to scan for machine checks");
 -	SYSCTL_ADD_NODE(NULL, SYSCTL_CHILDREN(mca_sysctl_tree), OID_AUTO,
 +	SYSCTL_ADD_NODE(NULL, SYSCTL_STATIC_CHILDREN(_hw_mca), OID_AUTO,
  	    "records", CTLFLAG_RD, sysctl_mca_records, "Machine check records");
 -	SYSCTL_ADD_PROC(NULL, SYSCTL_CHILDREN(mca_sysctl_tree), OID_AUTO,
 +	SYSCTL_ADD_PROC(NULL, SYSCTL_STATIC_CHILDREN(_hw_mca), OID_AUTO,
  	    "force_scan", CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, NULL, 0,
  	    sysctl_mca_scan, "I", "Force an immediate scan for machine checks");
  }
 @@ -465,7 +468,7 @@ mca_init(void)
  	int i;
  
  	/* MCE is required. */
 -	if (!(cpu_feature & CPUID_MCE))
 +	if (!mca_enabled || !(cpu_feature & CPUID_MCE))
  		return;
  
  	if (cpu_feature & CPUID_MCA) {
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: open->closed 
State-Changed-By: jhb 
State-Changed-When: Mon Oct 11 19:05:00 UTC 2010 
State-Changed-Why:  
Feedback timeout. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=134586 
>Unformatted:
