From nobody@FreeBSD.org  Thu Jun 10 18:10:46 2010
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 59AB2106567C
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 10 Jun 2010 18:10:46 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 301478FC08
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 10 Jun 2010 18:10:46 +0000 (UTC)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id o5AIAk4v044938
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 10 Jun 2010 18:10:46 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id o5AIAj29044928;
	Thu, 10 Jun 2010 18:10:45 GMT
	(envelope-from nobody)
Message-Id: <201006101810.o5AIAj29044928@www.freebsd.org>
Date: Thu, 10 Jun 2010 18:10:45 GMT
From: Marcel Moolenaar <marcel@FreeBSD.org>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [ia64] ptc_g causes MCA on McKinley & Madison CPUs
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         147772
>Category:       ia64
>Synopsis:       [ia64] ptc_g causes MCA on McKinley & Madison CPUs
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-ia64
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Jun 10 18:20:01 UTC 2010
>Closed-Date:    Sat Jun 12 01:46:44 UTC 2010
>Last-Modified:  Sat Jun 12 01:50:04 UTC 2010
>Originator:     Marcel Moolenaar
>Release:        9-CURRENT
>Organization:
>Environment:
FreeBSD pluto2.freebsd.org 9.0-CURRENT FreeBSD 9.0-CURRENT #19 r208970M: Thu Jun 10 04:12:20 UTC 2010     marcel@pluto2.freebsd.org:/usr/obj/tank/usr/src/sys/PLUTO2  ia64

>Description:
Background:
    The code following the exception_save_restart and exception_restore_restart labels run with psr.ic disabled. A TLB miss after will trigger a Nested Data TLB Fault. The code has been designed so that
a TLB miss, when happens, will happen in the first bundle after these labels and the Nested Data TLB Fault handler will know how to insert the TLB and restart the bundle. The underlying assumption is that when the TLB is in the translation cache, the entire sequence will complete without a TLB miss until either psr.ic can be enabled or the rfi instruction is executed. The only TLB that we need is the one for the kernel stack so that we can read or write the trapframe.

Problem:
    The ptc.g operation for the Mckinley and Madison processors has the side-effect of purging more than the requested translation. While this is not a problem in general, it invalidates the assumption made for exception_save_restart and exception_restore_restart in SMP configurations. Since the ptc.g purges the translation caches of all CPUs in the coherency domain, a ptc.g executed on one CPU can cause a purge on another CPU that is currently running the critical code sequences following the exception_save_restart and exception_restore_restart. While the purge address is never the translation relating to the trapframe that is being read or written, the behaviour of McKInley and Madison processors in purging more than the requested translation can result in an unexpected TLB miss. This then results in the mishandling of the Nested Data TLB Fault, which typically results in a machine check.

This problem is not observed on a Montecito processor. The problem was also never observed on Merced, FWIW.

>How-To-Repeat:
Run pho's stress2 test on McKinley or Madison with SMP enabled.
>Fix:
There are 2 possible fixes:
1.  serialize ptc.g with respect to exception_save_restart and exception_restore_restart so that never execute ptc.g on one processor while some other processor is running the critical sequence.
2.  replace the use of ptc.g with an IPI mechanism and have all CPUs execute ptc.l locally. This guarantees that no purge will be visible to any CPU when executing the critical sequence.


>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->closed 
State-Changed-By: marcel 
State-Changed-When: Sat Jun 12 01:46:20 UTC 2010 
State-Changed-Why:  
Fix committed. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=147772 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: ia64/147772: commit references a PR
Date: Sat, 12 Jun 2010 01:45:44 +0000 (UTC)

 Author: marcel
 Date: Sat Jun 12 01:45:29 2010
 New Revision: 209085
 URL: http://svn.freebsd.org/changeset/base/209085
 
 Log:
   The ptc.g operation for the Mckinley and Madison processors has the
   side-effect of purging more than the requested translation. While
   this is not a problem in general, it invalidates the assumption made
   during constructing the trapframe on entry into the kernel in SMP
   configurations. The assumption is that only the first store to the
   stack will possibly cause a TLB miss. Since the ptc.g purges the
   translation caches of all CPUs in the coherency domain, a ptc.g
   executed on one CPU can cause a purge on another CPU that is
   currently running the critical code that saves the state to the
   trapframe. This can cause an unexpected TLB miss and with interrupt
   collection disabled this means an unexpected data nested TLB fault.
   
   A data nested TLB fault will not save any context, nor provide a
   way for software to determine what caused the TLB miss nor where
   it occured. Careful construction of the kernel entry and exit code
   allows us to handle a TLB miss in precisely orchastrated points
   and thereby avoiding the need to wire the kernel stack, but the
   unexpected TLB miss caused by the ptc.g instructution resulted in
   an unrecoverable condition and resulting in machine checks.
   
   The solution to this problem is to synchronize the kernel entry
   on all CPUs with the use of the ptc.g instruction on a single CPU
   by implementing a bare-bones readers-writer lock that allows N
   readers (= N CPUs entering the kernel) and 1 writer (= execution
   of the ptc.g instruction on some CPU). This solution wins over
   a rendez-vous approach by not interrupting CPUs with an IPI.
   
   This problem has not been observed on the Montecito.
   
   PR:		ia64/147772
   MFC after:	6 days
 
 Modified:
   head/sys/ia64/ia64/exception.S
   head/sys/ia64/ia64/pmap.c
 
 Modified: head/sys/ia64/ia64/exception.S
 ==============================================================================
 --- head/sys/ia64/ia64/exception.S	Sat Jun 12 00:28:53 2010	(r209084)
 +++ head/sys/ia64/ia64/exception.S	Sat Jun 12 01:45:29 2010	(r209085)
 @@ -170,6 +170,27 @@ ENTRY_NOPROFILE(exception_save, 0)
  	 *	r30,r31=trapframe pointers
  	 *	p14,p15=memory stack switch
  	 */
 +
 +	/* PTC.G enter non-exclusive */
 +	mov	r24 = ar.ccv
 +	movl	r25 = pmap_ptc_g_sem
 +	;;
 +.ptc_g_0:
 +	ld8.acq	r26 = [r25]
 +	;;
 +	tbit.nz	p12, p0 = r26, 63
 +(p12)	br.cond.spnt.few .ptc_g_0
 +	;;
 +	mov	ar.ccv = r26
 +	adds	r27 = 1, r26
 +	;;
 +	cmpxchg8.rel	r27 = [r25], r27, ar.ccv
 +	;;
 +	cmp.ne	p12, p0 = r26, r27
 +(p12)	br.cond.spnt.few .ptc_g_0
 +	;;
 +	mov	ar.ccv = r24
 +
  exception_save_restart:
  {	.mmi
  	st8		[r30]=r19,16		// length
 @@ -407,6 +428,23 @@ exception_save_restart:
  	movl		gp=__gp
  	;;
  }
 +
 +	/* PTC.G leave non-exclusive */
 +	srlz.d
 +	movl	r25 = pmap_ptc_g_sem
 +	;;
 +.ptc_g_1:
 +	ld8.acq r26 = [r25]
 +	;;
 +	mov	ar.ccv = r26
 +	adds	r27 = -1, r26
 +	;;
 +	cmpxchg8.rel	r27 = [r25], r27, ar.ccv
 +	;;
 +	cmp.ne	p12, p0 = r26, r27
 +(p12)	br.cond.spnt.few .ptc_g_1
 +	;;
 +
  {	.mib
  	srlz.d
  	nop		0
 
 Modified: head/sys/ia64/ia64/pmap.c
 ==============================================================================
 --- head/sys/ia64/ia64/pmap.c	Sat Jun 12 00:28:53 2010	(r209084)
 +++ head/sys/ia64/ia64/pmap.c	Sat Jun 12 01:45:29 2010	(r209085)
 @@ -182,7 +182,8 @@ static uint64_t pmap_ptc_e_count1 = 3;
  static uint64_t pmap_ptc_e_count2 = 2;
  static uint64_t pmap_ptc_e_stride1 = 0x2000;
  static uint64_t pmap_ptc_e_stride2 = 0x100000000;
 -struct mtx pmap_ptcmutex;
 +
 +volatile u_long pmap_ptc_g_sem;
  
  /*
   * Data for the RID allocator
 @@ -340,7 +341,6 @@ pmap_bootstrap()
  		       pmap_ptc_e_count2,
  		       pmap_ptc_e_stride1,
  		       pmap_ptc_e_stride2);
 -	mtx_init(&pmap_ptcmutex, "Global PTC lock", NULL, MTX_SPIN);
  
  	/*
  	 * Setup RIDs. RIDs 0..7 are reserved for the kernel.
 @@ -540,7 +540,8 @@ pmap_invalidate_page(vm_offset_t va)
  {
  	struct ia64_lpte *pte;
  	struct pcpu *pc;
 -	uint64_t tag;
 +	uint64_t tag, sem;
 +	register_t is;
  	u_int vhpt_ofs;
  
  	critical_enter();
 @@ -550,10 +551,32 @@ pmap_invalidate_page(vm_offset_t va)
  		pte = (struct ia64_lpte *)(pc->pc_md.vhpt + vhpt_ofs);
  		atomic_cmpset_64(&pte->tag, tag, 1UL << 63);
  	}
 -	critical_exit();
 -	mtx_lock_spin(&pmap_ptcmutex);
 +
 +	/* PTC.G enter exclusive */
 +	is = intr_disable();
 +
 +	/* Atomically assert writer after all writers have gone. */
 +	do {
 +		/* Wait until there's no more writer. */
 +		do {
 +			sem = atomic_load_acq_long(&pmap_ptc_g_sem);
 +			tag = sem | (1ul << 63);
 +		} while (sem == tag);
 +	} while (!atomic_cmpset_rel_long(&pmap_ptc_g_sem, sem, tag));
 +
 +	/* Wait until all readers are gone. */
 +	tag = (1ul << 63);
 +	do {
 +		sem = atomic_load_acq_long(&pmap_ptc_g_sem);
 +	} while (sem != tag);
 +
  	ia64_ptc_ga(va, PAGE_SHIFT << 2);
 -	mtx_unlock_spin(&pmap_ptcmutex);
 +
 +	/* PTC.G leave exclusive */
 +	atomic_store_rel_long(&pmap_ptc_g_sem, 0);
 +
 +	intr_restore(is);
 +	critical_exit();
  }
  
  static void
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
>Unformatted:
