From nobody@FreeBSD.org  Thu May 28 01:38:46 2009
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 579A8106567B
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 28 May 2009 01:38:46 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 44E7A8FC18
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 28 May 2009 01:38:46 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.3/8.14.3) with ESMTP id n4S1cjII037877
	for <freebsd-gnats-submit@FreeBSD.org>; Thu, 28 May 2009 01:38:46 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.3/8.14.3/Submit) id n4S1cjpg037875;
	Thu, 28 May 2009 01:38:45 GMT
	(envelope-from nobody)
Message-Id: <200905280138.n4S1cjpg037875@www.freebsd.org>
Date: Thu, 28 May 2009 01:38:45 GMT
From: Mark Gladman <mark@legios.org>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Using padlock(4) in 8-current triggers "fpudna in kernel mode!" warnings
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         135014
>Category:       amd64
>Synopsis:       [padlock] Using padlock(4) in 8-current triggers "fpudna in kernel mode!" warnings
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    kib
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu May 28 01:40:02 UTC 2009
>Closed-Date:    Fri Nov 19 14:15:10 UTC 2010
>Last-Modified:  Fri Nov 19 14:15:10 UTC 2010
>Originator:     Mark Gladman
>Release:        8-current
>Organization:
>Environment:
FreeBSD azacca.legios.org 8.0-CURRENT-200905 FreeBSD 8.0-CURRENT-200905 #0: Mon May  4 21:11:26 UTC 2009     root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
>Description:
When padlock(4) is loaded and utilised, the console gets flooded with
"fpudna in kernel mode!" messages.

I am using padlock on a Via VB8001 Nano board.

It doesn't appear to cause any crashes etc. however it does appear to
generate thousands of messages per minute, which makes me think that
it's slowing down the system as a whole.
>How-To-Repeat:
kldload padlock
geli init -P -e aes -l 256 -K /root/geli.key -s 4096 /dev/da0
geli attach -pk /root/geli.key /dev/da0
newfs /dev/da0.eli
mount /dev/da0.eli /mnt
cd /mnt
dd if=/dev/zero of=file.000 bs=1m count=100
>Fix:
None known

>Release-Note:
>Audit-Trail:

From: Michael Moll <kvedulv@kvedulv.de>
To: bug-followup@FreeBSD.org, mark@legios.org
Cc:  
Subject: Re: kern/135014: [padlock] Using padlock(4) in 8-current triggers
 "fpudna in kernel mode" warnings
Date: Thu, 1 Oct 2009 22:35:14 +0200

 as 9-CURRENT now produces backtraces in such cases, here is one:
 
 fpudna in kernel mode!
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
 trap() at trap+0x465
 calltrap() at calltrap+0x8
 --- trap 0x16, rip = 0xffffffff81256b79, rsp = 0xffffff803e819a70, rbp = 0xffffff803e819ad0 ---
 padlock_cipher_process() at padlock_cipher_process+0x140
 padlock_process() at padlock_process+0x157
 crypto_invoke() at crypto_invoke+0x87
 crypto_dispatch() at crypto_dispatch+0xfa
 g_eli_crypto_run() at g_eli_crypto_run+0x16c
 g_eli_worker() at g_eli_worker+0x135
 fork_exit() at fork_exit+0x12a
 fork_trampoline() at fork_trampoline+0xe
 --- trap 0, rip = 0, rsp = 0xffffff803e819d30, rbp = 0 ---
Responsible-Changed-From-To: freebsd-bugs->freebsd-amd64 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Tue Dec 15 00:34:16 UTC 2009 
Responsible-Changed-Why:  
By request of Michael Moll in followup, reclassify this as an amd64 
bug.  His theory is that the floating-point registers may not be being 
handled correctly in the kernel. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=135014 

From: Patrick Lamaiziere <patfbsd@davenulle.org>
To: bug-followup@freebsd.org
Cc: Bruce Evans <brde@optusnet.com.au>, freebsd-amd64@FreeBSD.org
Subject: Re: amd64/135014: [padlock] Using padlock(4) in 8-current triggers
 "fpudna in kernel mode!" warnings
Date: Tue, 15 Dec 2009 20:04:59 +0100

 Le Wed, 16 Dec 2009 01:35:07 +1100 (EST),
 Bruce Evans <brde@optusnet.com.au> a =E9crit :
 
 > [This probably won't make it into the followup, since gnats still
 > doesn't generate useful followup addresses and I didn't tyy to edit
 > the headers.]
 >=20
 > > Synopsis: [padlock] Using padlock(4) in 8-current triggers "fpudna
 > > in kernel mode!" warnings
 >=20
 > > By request of Michael Moll in followup, reclassify this as an amd64
 > > bug.  His theory is that the floating-point registers may not be
 > > being handled correctly in the kernel.
 >=20
 > This seems to be a bug in padlock(4).  Apparently the inline asm that
 > it uses requires the FPU.  But use of the FPU in the kernel is not
 > supported. (except the obsolete i586 copy optimizations).
 
 According to the Linux code, padlock does not use the FPU but can
 generate a DNA fault:
 http://fxr.watson.org/fxr/source/drivers/crypto/padlock-aes.c?v=3Dlinux-2.6=
 #L182
 =20
 They use a irq_ts_save() / restore operation between padlock
 intruction. I don't know if there are similar things into the FreeBSD
 kernel:
 http://fxr.watson.org/fxr/source/drivers/crypto/padlock-aes.c?v=3Dlinux-2.6=
 #L296
 
 See also this thread:
 http://lkml.indiana.edu/hypermail/linux/kernel/0808.1/0306.html
 
 HTH, regards.

From: Mark Linimon <linimon@lonesome.com>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: amd64/135014: [padlock] Using padlock(4) in 8-current triggers
	"fpudna in kernel mode!" warnings
Date: Sun, 27 Dec 2009 13:57:52 -0600

 From: Bruce Evans <brde@optusnet.com.au>
 To: linimon@FreeBSD.org
 cc: freebsd-bugs@FreeBSD.org, freebsd-amd64@FreeBSD.org
 Subject: Re: amd64/135014: [padlock] Using padlock(4) in 8-current triggers
 	"fpudna in kernel mode!" warnings
 
 > By request of Michael Moll in followup, reclassify this as an amd64
 > bug.  His theory is that the floating-point registers may not be being
 > handled correctly in the kernel.
 
 This seems to be a bug in padlock(4).  Apparently the inline asm that it
 uses requires the FPU.  But use of the FPU in the kernel is not supported.
 (except the obsolete i586 copy optimizations).
 
 This bug doesn't seem to be  amd64-specific.  The bug was smaller on
 amd64 than on i386.  i386 didn't even print a warning when the unsupported
 use is detected.  emaste@ fixed this recently.  He just added the printf,
 to help debug the problem.  The printf should always have been a panic,
 but changing to a panic now would be too drastic.
 
 Various hacks are possible for using the FPU in the kernel.  Here the
 use seems to be in a kernel thread (g_eli[n]?).  Since all threads are
 heavyweight, they get a private virtualized copy of the FPU as part
 of their weight, and since they don't make syscalls, and since normal
 interrupt handlers are also heavyweight threads and "fast" interrupt
 handlers hopefully aren't so broken as to use the FPU, this copy
 hopefully doesn't get corrupted by them (kthreads) running in a separate
 kernel context, so ignoring the bug happens to give the correct behaviour.
 Even for user threads making syscalls, ignoring the bug would mostly
 give correct behaviour, since in normal ABIs syscalls are a sort of
 sequence point at which the FPU is mostly unused -- only changes to the
 FPU environment while in kernel context would corrupt the in-use part.
 
 So an fairly easy fix for the case in this PR might be for kthreads
 that use the FPU to tell the kernel that they really mean to use it
 and/or guarantee safe use, so that this use can be distinguished from
 accidental possibly-unsafe use.
 
 Bruce

From: Andriy Gapon <avg@icyb.net.ua>
To: bug-followup@FreeBSD.org, mark@legios.org
Cc:  
Subject: Re: amd64/135014: [padlock] Using padlock(4) in 8-current triggers
 "fpudna in kernel mode!" warnings
Date: Tue, 23 Mar 2010 12:13:23 +0200

 Some potentially useful info on this issue, mostly for the reference.
 
 Here's how Linux does it:
 http://lxr.linux.no/#linux+v2.6.33/drivers/crypto/padlock-aes.c#L182
 It seems that they claim that padlock instructions do not actually use XMM
 registers, but they are sensitive about TS bit.
 
 Also, kib@ has patch that allows to actually use XMM registers in kernel within
 a limited scope:
 http://docs.freebsd.org/cgi/getmsg.cgi?fetch=2304+0+current/freebsd-amd64
 
 -- 
 Andriy Gapon
Responsible-Changed-From-To: freebsd-amd64->kib 
Responsible-Changed-By: kib 
Responsible-Changed-When: Tue Apr 6 11:22:42 UTC 2010 
Responsible-Changed-Why:  
Take. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=135014 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: amd64/135014: commit references a PR
Date: Sat,  5 Jun 2010 16:01:09 +0000 (UTC)

 Author: kib
 Date: Sat Jun  5 16:00:53 2010
 New Revision: 208834
 URL: http://svn.freebsd.org/changeset/base/208834
 
 Log:
   Use the fpu_kern_enter() interface to properly separate usermode FPU
   context from in-kernel execution of padlock instructions and to handle
   spurious FPUDNA exceptions that sometime are raised when doing padlock
   calculations.
   
   Globally mark crypto(9) kthread as using FPU.
   
   Reviewed by:	pjd
   Hardware provided by:	Sentex Communications
   Tested by:	  pho
   PR:    amd64/135014
   MFC after:    1 month
 
 Modified:
   head/sys/crypto/via/padlock.c
   head/sys/crypto/via/padlock.h
   head/sys/crypto/via/padlock_cipher.c
   head/sys/crypto/via/padlock_hash.c
   head/sys/dev/random/nehemiah.c
   head/sys/opencrypto/crypto.c
 
 Modified: head/sys/crypto/via/padlock.c
 ==============================================================================
 --- head/sys/crypto/via/padlock.c	Sat Jun  5 15:59:59 2010	(r208833)
 +++ head/sys/crypto/via/padlock.c	Sat Jun  5 16:00:53 2010	(r208834)
 @@ -169,6 +169,7 @@ padlock_newsession(device_t dev, uint32_
  	struct padlock_softc *sc = device_get_softc(dev);
  	struct padlock_session *ses = NULL;
  	struct cryptoini *encini, *macini;
 +	struct thread *td;
  	int error;
  
  	if (sidp == NULL || cri == NULL)
 @@ -236,7 +237,12 @@ padlock_newsession(device_t dev, uint32_
  	}
  
  	if (macini != NULL) {
 -		error = padlock_hash_setup(ses, macini);
 +		td = curthread;
 +		error = fpu_kern_enter(td, &ses->ses_fpu_ctx, FPU_KERN_NORMAL);
 +		if (error == 0) {
 +			error = padlock_hash_setup(ses, macini);
 +			fpu_kern_leave(td, &ses->ses_fpu_ctx);
 +		}
  		if (error != 0) {
  			padlock_freesession_one(sc, ses, 0);
  			return (error);
 
 Modified: head/sys/crypto/via/padlock.h
 ==============================================================================
 --- head/sys/crypto/via/padlock.h	Sat Jun  5 15:59:59 2010	(r208833)
 +++ head/sys/crypto/via/padlock.h	Sat Jun  5 16:00:53 2010	(r208834)
 @@ -32,6 +32,12 @@
  #include <opencrypto/cryptodev.h>
  #include <crypto/rijndael/rijndael.h>
  
 +#if defined(__i386__)
 +#include <machine/npx.h>
 +#elif defined(__amd64__)
 +#include <machine/fpu.h>
 +#endif
 +
  union padlock_cw {
  	uint64_t raw;
  	struct {
 @@ -70,6 +76,7 @@ struct padlock_session {
  	int		ses_used;
  	uint32_t	ses_id;
  	TAILQ_ENTRY(padlock_session) ses_next;
 +	struct fpu_kern_ctx ses_fpu_ctx;
  };
  
  #define	PADLOCK_ALIGN(p)	(void *)(roundup2((uintptr_t)(p), 16))
 
 Modified: head/sys/crypto/via/padlock_cipher.c
 ==============================================================================
 --- head/sys/crypto/via/padlock_cipher.c	Sat Jun  5 15:59:59 2010	(r208833)
 +++ head/sys/crypto/via/padlock_cipher.c	Sat Jun  5 16:00:53 2010	(r208834)
 @@ -53,6 +53,7 @@ __FBSDID("$FreeBSD$");
  #include <sys/module.h>
  #include <sys/malloc.h>
  #include <sys/libkern.h>
 +#include <sys/pcpu.h>
  #include <sys/uio.h>
  
  #include <opencrypto/cryptodev.h>
 @@ -201,9 +202,10 @@ padlock_cipher_process(struct padlock_se
      struct cryptop *crp)
  {
  	union padlock_cw *cw;
 +	struct thread *td;
  	u_char *buf, *abuf;
  	uint32_t *key;
 -	int allocated;
 +	int allocated, error;
  
  	buf = padlock_cipher_alloc(enccrd, crp, &allocated);
  	if (buf == NULL)
 @@ -247,9 +249,16 @@ padlock_cipher_process(struct padlock_se
  		    enccrd->crd_len, abuf);
  	}
  
 +	td = curthread;
 +	error = fpu_kern_enter(td, &ses->ses_fpu_ctx, FPU_KERN_NORMAL);
 +	if (error != 0)
 +		goto out;
 +
  	padlock_cbc(abuf, abuf, enccrd->crd_len / AES_BLOCK_LEN, key, cw,
  	    ses->ses_iv);
  
 +	fpu_kern_leave(td, &ses->ses_fpu_ctx);
 +
  	if (allocated) {
  		crypto_copyback(crp->crp_flags, crp->crp_buf, enccrd->crd_skip,
  		    enccrd->crd_len, abuf);
 @@ -262,9 +271,10 @@ padlock_cipher_process(struct padlock_se
  		    AES_BLOCK_LEN, ses->ses_iv);
  	}
  
 + out:
  	if (allocated) {
  		bzero(buf, enccrd->crd_len + 16);
  		free(buf, M_PADLOCK);
  	}
 -	return (0);
 +	return (error);
  }
 
 Modified: head/sys/crypto/via/padlock_hash.c
 ==============================================================================
 --- head/sys/crypto/via/padlock_hash.c	Sat Jun  5 15:59:59 2010	(r208833)
 +++ head/sys/crypto/via/padlock_hash.c	Sat Jun  5 16:00:53 2010	(r208834)
 @@ -34,12 +34,14 @@ __FBSDID("$FreeBSD$");
  #include <sys/malloc.h>
  #include <sys/libkern.h>
  #include <sys/endian.h>
 +#include <sys/pcpu.h>
  #if defined(__amd64__) || (defined(__i386__) && !defined(PC98))
  #include <machine/cpufunc.h>
  #include <machine/cputypes.h>
  #include <machine/md_var.h>
  #include <machine/specialreg.h>
  #endif
 +#include <machine/pcb.h>
  
  #include <opencrypto/cryptodev.h>
  #include <opencrypto/cryptosoft.h> /* for hmac_ipad_buffer and hmac_opad_buffer */
 @@ -363,12 +365,18 @@ int
  padlock_hash_process(struct padlock_session *ses, struct cryptodesc *maccrd,
      struct cryptop *crp)
  {
 +	struct thread *td;
  	int error;
  
 +	td = curthread;
 +	error = fpu_kern_enter(td, &ses->ses_fpu_ctx, FPU_KERN_NORMAL);
 +	if (error != 0)
 +		return (error);
  	if ((maccrd->crd_flags & CRD_F_KEY_EXPLICIT) != 0)
  		padlock_hash_key_setup(ses, maccrd->crd_key, maccrd->crd_klen);
  
  	error = padlock_authcompute(ses, maccrd, crp->crp_buf, crp->crp_flags);
 +	fpu_kern_leave(td, &ses->ses_fpu_ctx);
  	return (error);
  }
  
 
 Modified: head/sys/dev/random/nehemiah.c
 ==============================================================================
 --- head/sys/dev/random/nehemiah.c	Sat Jun  5 15:59:59 2010	(r208833)
 +++ head/sys/dev/random/nehemiah.c	Sat Jun  5 16:00:53 2010	(r208834)
 @@ -35,6 +35,8 @@ __FBSDID("$FreeBSD$");
  #include <sys/selinfo.h>
  #include <sys/systm.h>
  
 +#include <machine/pcb.h>
 +
  #include <dev/random/randomdev.h>
  
  #define RANDOM_BLOCK_SIZE	256
 @@ -82,6 +84,8 @@ static uint8_t out[RANDOM_BLOCK_SIZE+7]	
  
  static union VIA_ACE_CW acw		__aligned(16);
  
 +static struct fpu_kern_ctx fpu_ctx_save;
 +
  static struct mtx random_nehemiah_mtx;
  
  /* ARGSUSED */
 @@ -142,11 +146,16 @@ random_nehemiah_deinit(void)
  static int
  random_nehemiah_read(void *buf, int c)
  {
 -	int i;
 +	int i, error;
  	size_t count, ret;
  	uint8_t *p;
  
  	mtx_lock(&random_nehemiah_mtx);
 +	error = fpu_kern_enter(curthread, &fpu_ctx_save, FPU_KERN_NORMAL);
 +	if (error != 0) {
 +		mtx_unlock(&random_nehemiah_mtx);
 +		return (0);
 +	}
  
  	/* Get a random AES key */
  	count = 0;
 @@ -187,6 +196,7 @@ random_nehemiah_read(void *buf, int c)
  	c = MIN(RANDOM_BLOCK_SIZE, c);
  	memcpy(buf, out, (size_t)c);
  
 +	fpu_kern_leave(curthread, &fpu_ctx_save);
  	mtx_unlock(&random_nehemiah_mtx);
  	return (c);
  }
 
 Modified: head/sys/opencrypto/crypto.c
 ==============================================================================
 --- head/sys/opencrypto/crypto.c	Sat Jun  5 15:59:59 2010	(r208833)
 +++ head/sys/opencrypto/crypto.c	Sat Jun  5 16:00:53 2010	(r208834)
 @@ -82,6 +82,10 @@ __FBSDID("$FreeBSD$");
  #include <sys/bus.h>
  #include "cryptodev_if.h"
  
 +#if defined(__i386__) || defined(__amd64__)
 +#include <machine/pcb.h>
 +#endif
 +
  SDT_PROVIDER_DEFINE(opencrypto);
  
  /*
 @@ -1241,6 +1245,10 @@ crypto_proc(void)
  	u_int32_t hid;
  	int result, hint;
  
 +#if defined(__i386__) || defined(__amd64__)
 +	fpu_kern_thread(FPU_KERN_NORMAL);
 +#endif
 +
  	CRYPTO_Q_LOCK();
  	for (;;) {
  		/*
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: amd64/135014: commit references a PR
Date: Fri, 19 Nov 2010 09:49:22 +0000 (UTC)

 Author: kib
 Date: Fri Nov 19 09:49:14 2010
 New Revision: 215513
 URL: http://svn.freebsd.org/changeset/base/215513
 
 Log:
   Merge the kern_fpu_enter/kern_fpu_leave KPI and followup fixes for the
   amd64 suspend/resume support.
   
   Tested by:	Mike Tancsa
   Also tested by:	Dewayne Geraghty <dewayne.geraghty heuristicsystems com au>,
        Daryl Richards <daryl isletech net>
   
   Below is the svn log of the merged revisions.
   ------------------------------------------------------------------------
   r197455 | emaste | 2009-09-24 17:26:42 +0300 (Thu, 24 Sep 2009) | 5 lines
   
   Add a backtrace to the "fpudna in kernel mode!" case, to help track down
   where this comes from.
   
   Reviewed by:	bde
   
   ------------------------------------------------------------------------
   r197863 | jkim | 2009-10-08 20:41:53 +0300 (Thu, 08 Oct 2009) | 8 lines
   
   Clean up amd64 suspend/resume code.
   
   - Allocate memory for wakeup code after ACPI bus is attached.  The early
   memory allocation hack was inherited from i386 but amd64 does not need it.
   - Exclude real mode IVT and BDA explicitly.  Improve comments about memory
   allocation and reason for the exclusions.  It is a no-op in reality, though.
   - Remove an unnecessary CLD from wakeup code and re-align.
   
   ------------------------------------------------------------------------
   r198931 | jkim | 2009-11-05 00:39:18 +0200 (Thu, 05 Nov 2009) | 2 lines
   
   Tweak memory allocation for amd64 suspend/resume CPU context.
   
   ------------------------------------------------------------------------
   r200280 | jkim | 2009-12-09 00:38:42 +0200 (Wed, 09 Dec 2009) | 2 lines
   
   Simplify a macro not to generate unncessary symbols.
   
   ------------------------------------------------------------------------
   r205444 | emaste | 2010-03-22 13:52:53 +0200 (Mon, 22 Mar 2010) | 7 lines
   
   Merge r197455 from amd64:
   
     Add a backtrace to the "fpudna in kernel mode!" case, to help track down
     where this comes from.
   
     Reviewed by:	bde
   
   ------------------------------------------------------------------------
   r208833 | kib | 2010-06-05 18:59:59 +0300 (Sat, 05 Jun 2010) | 15 lines
   
   Introduce the x86 kernel interfaces to allow kernel code to use
   FPU/SSE hardware. Caller should provide a save area that is chained
   into the stack of the areas; pcb save_area for usermode FPU state is
   on top. The pcb now contains a pointer to the current FPU saved area,
   used during FPUDNA handling and context switches.  There is also a
   facility to allow the kernel thread to use pcb save_area.
   
   Change the dreaded warnings "npxdna in kernel mode!" into the panics
   when FPU usage is not registered.
   
   KPI discussed with:	fabient
   Tested by:    pho, fabient
   Hardware provided by:	Sentex Communications
   MFC after:    1 month
   
   ------------------------------------------------------------------------
   r208834 | kib | 2010-06-05 19:00:53 +0300 (Sat, 05 Jun 2010) | 13 lines
   
   Use the fpu_kern_enter() interface to properly separate usermode FPU
   context from in-kernel execution of padlock instructions and to handle
   spurious FPUDNA exceptions that sometime are raised when doing padlock
   calculations.
   
   Globally mark crypto(9) kthread as using FPU.
   
   Reviewed by:	pjd
   Hardware provided by:	Sentex Communications
   Tested by:	  pho
   PR:    amd64/135014
   MFC after:    1 month
   
   ------------------------------------------------------------------------
   r208877 | kib | 2010-06-06 19:13:50 +0300 (Sun, 06 Jun 2010) | 5 lines
   
   Style-compilant order of declarations.
   
   Noted by:	bde
   MFC after:	1 month
   
   ------------------------------------------------------------------------
   r209174 | jkim | 2010-06-14 23:08:26 +0300 (Mon, 14 Jun 2010) | 3 lines
   
   Fix ACPI suspend/resume on amd64, which was broken since r208833.
   We need actual storage for FPU state to save and restore.
   
   ------------------------------------------------------------------------
   r209198 | kib | 2010-06-15 12:19:33 +0300 (Tue, 15 Jun 2010) | 10 lines
   
   Use critical sections instead of disabling local interrupts to ensure
   the consistency between PCPU fpcurthread and the state of the FPU.
   
   Explicitely assert that the calling conventions for fpudrop() are
   adhered too. In cpu_thread_exit(), add missed critical section entrance.
   
   Reviewed by:	bde
   Tested by:	pho
   MFC after:	1 month
   
   ------------------------------------------------------------------------
   r209204 | kib | 2010-06-15 17:59:35 +0300 (Tue, 15 Jun 2010) | 5 lines
   
   Rename CRITSECT_ASSERT to CRITICAL_ASSERT.
   
   Suggested by:	jhb
   MFC after:	1 month
   
   ------------------------------------------------------------------------
   r209208 | kib | 2010-06-15 21:16:04 +0300 (Tue, 15 Jun 2010) | 4 lines
   
   Remove two obsoleted comments, add a note about 32bit compatibility.
   
   MFC after:	1 month
   
   ------------------------------------------------------------------------
   r209252 | kib | 2010-06-17 15:35:17 +0300 (Thu, 17 Jun 2010) | 6 lines
   
   In the ia32_{get,set}_fpcontext(), use fpu{get,set}userregs instead
   of fpu{get,set}regs.
   
   Noted by:	bde
   MFC after:	1 month
   
   ------------------------------------------------------------------------
   r209460 | kib | 2010-06-23 13:40:28 +0300 (Wed, 23 Jun 2010) | 8 lines
   
   Remove unused i586 optimized bcopy/bzero/etc implementations that utilize
   FPU registers for copying. Remove the switch table and jumps from
   bcopy/bzero/... to the actual implementation.
   As a side-effect, i486-optimized bzero is removed.
   
   Reviewed by:	bde
   Tested by:	pho (previous version)
   
   ------------------------------------------------------------------------
   r209461 | kib | 2010-06-23 14:12:58 +0300 (Wed, 23 Jun 2010) | 8 lines
   
   Remove the support for int13 FPU exception reporting on i386. It is
   believed that all 486-class CPUs FreeBSD is capable to run on, either
   have no FPU and cannot use external coprocessor, or have FPU on the
   package and can use #MF.
   
   Reviewed by:	bde
   Tested by:	pho (previous version)
   
   ------------------------------------------------------------------------
   r209462 | kib | 2010-06-23 14:21:19 +0300 (Wed, 23 Jun 2010) | 8 lines
   
   After the FPU use requires #MF working due to INT13 FPU exception handling
   removal, MFi386 r209198:
       Use critical sections instead of disabling local interrupts to ensure
       the consistency between PCPU fpcurthread and the state of FPU.
   
   Reviewed by:	bde
   Tested by:	pho
   
   ------------------------------------------------------------------------
   r210514 | jkim | 2010-07-26 22:53:09 +0300 (Mon, 26 Jul 2010) | 6 lines
   
   Re-implement FPU suspend/resume for amd64.  This removes superfluous uses
   of critical_enter(9) and critical_exit(9) by fpugetregs() and fpusetregs().
   Also, we do not touch PCB flags any more.
   
   MFC after:	1 month
   
   ------------------------------------------------------------------------
   r210517 | jkim | 2010-07-27 00:24:52 +0300 (Tue, 27 Jul 2010) | 4 lines
   
   FNSTSW instruction can use AX register as an operand.
   
   Obtained from:	fenv.h
   
   ------------------------------------------------------------------------
   r210518 | jkim | 2010-07-27 01:16:36 +0300 (Tue, 27 Jul 2010) | 5 lines
   
   Reduce diff against fenv.h:
   
   Mark all inline asms as volatile for safety.  No object file change after
   this commit (verified with md5).
   
   ------------------------------------------------------------------------
   r210519 | jkim | 2010-07-27 01:55:14 +0300 (Tue, 27 Jul 2010) | 2 lines
   
   Remove an unused macro since r189418.
   
   ------------------------------------------------------------------------
   r210520 | jkim | 2010-07-27 02:02:18 +0300 (Tue, 27 Jul 2010) | 2 lines
   
   Add missing ldmxcsr() prototype for lint case.
   
   ------------------------------------------------------------------------
   r210521 | jkim | 2010-07-27 02:20:55 +0300 (Tue, 27 Jul 2010) | 3 lines
   
   Simplify fldcw() macro.  There is no reason to use pointer here.  No object
   file change after this commit (verified with md5).
   
   ------------------------------------------------------------------------
   r210614 | jkim | 2010-07-29 19:41:21 +0300 (Thu, 29 Jul 2010) | 2 lines
   
   Rename PCB_USER_FPU to PCB_USERFPU not to clash with a macro from fpu.h.
   
   ------------------------------------------------------------------------
   r210615 | jkim | 2010-07-29 19:49:20 +0300 (Thu, 29 Jul 2010) | 5 lines
   
   Fix another fallout from r208833.  savectx() is used to save CPU context
   for crash dump (dumppcb) and kdb (stoppcbs).  For both cases, there cannot
   have a valid pointer in pcb_save.  This should restore the previous
   behaviour.
   
   ------------------------------------------------------------------------
   r210777 | jkim | 2010-08-02 20:35:00 +0300 (Mon, 02 Aug 2010) | 13 lines
   
   - Merge savectx2() with savectx() and struct xpcb with struct pcb. [1]
   savectx() is only used for panic dump (dumppcb) and kdb (stoppcbs).  Thus,
   saving additional information does not hurt and it may be even beneficial.
   Unfortunately, struct pcb has grown larger to accommodate more data.
   Move 512-byte long pcb_user_save to the end of struct pcb while I am here.
   - savectx() now saves FPU state unconditionally and copy it to the PCB of
   FPU thread if necessary.  This gives panic dump and kdb a chance to take
   a look at the current FPU state even if the FPU is "supposedly" not used.
   - Resuming CPU now unconditionally reinitializes FPU.  If the saved FPU
   state was irrelevant, it could be in an unknown state.
   
   Suggested by:	bde [1]
   
   ------------------------------------------------------------------------
   r210804 | jkim | 2010-08-03 18:32:08 +0300 (Tue, 03 Aug 2010) | 6 lines
   
   savectx() has not been used for fork(2) for about 15 years. [1]
   Do not clobber FPU thread's PCB as it is more harmful.  When we resume CPU,
   unconditionally reload FPU state.
   
   Pointed out by:	bde [1]
   
   ------------------------------------------------------------------------
   r212026 | jkim | 2010-08-31 00:19:42 +0300 (Tue, 31 Aug 2010) | 3 lines
   
   Save MSR_FSBASE, MSR_GSBASE and MSR_KGSBASE directly to PCB as we do not use
   these values in the function.
   
   ------------------------------------------------------------------------
   r214347 | jhb | 2010-10-25 18:31:13 +0300 (Mon, 25 Oct 2010) | 5 lines
   
   Use 'saveintr' instead of 'savecrit' or 'eflags' to hold the state returned
   by intr_disable().
   
   Requested by:	bde
   
   ------------------------------------------------------------------------
 
 Modified:
   stable/8/sys/amd64/acpica/acpi_machdep.c
   stable/8/sys/amd64/acpica/acpi_switch.S
   stable/8/sys/amd64/acpica/acpi_wakecode.S
   stable/8/sys/amd64/acpica/acpi_wakeup.c
   stable/8/sys/amd64/amd64/cpu_switch.S
   stable/8/sys/amd64/amd64/fpu.c
   stable/8/sys/amd64/amd64/genassym.c
   stable/8/sys/amd64/amd64/machdep.c
   stable/8/sys/amd64/amd64/mp_machdep.c
   stable/8/sys/amd64/amd64/trap.c
   stable/8/sys/amd64/amd64/vm_machdep.c
   stable/8/sys/amd64/ia32/ia32_reg.c
   stable/8/sys/amd64/ia32/ia32_signal.c
   stable/8/sys/amd64/include/fpu.h
   stable/8/sys/amd64/include/pcb.h
   stable/8/sys/crypto/via/padlock.c
   stable/8/sys/crypto/via/padlock.h
   stable/8/sys/crypto/via/padlock_cipher.c
   stable/8/sys/crypto/via/padlock_hash.c
   stable/8/sys/dev/fb/fbreg.h
   stable/8/sys/dev/random/nehemiah.c
   stable/8/sys/i386/i386/identcpu.c
   stable/8/sys/i386/i386/initcpu.c
   stable/8/sys/i386/i386/machdep.c
   stable/8/sys/i386/i386/perfmon.c
   stable/8/sys/i386/i386/ptrace_machdep.c
   stable/8/sys/i386/i386/support.s
   stable/8/sys/i386/i386/swtch.s
   stable/8/sys/i386/i386/trap.c
   stable/8/sys/i386/i386/vm_machdep.c
   stable/8/sys/i386/include/md_var.h
   stable/8/sys/i386/include/npx.h
   stable/8/sys/i386/include/pcb.h
   stable/8/sys/i386/isa/npx.c
   stable/8/sys/i386/linux/linux_ptrace.c
   stable/8/sys/kern/subr_trap.c
   stable/8/sys/opencrypto/crypto.c
   stable/8/sys/pc98/include/npx.h
   stable/8/sys/pc98/pc98/machdep.c
   stable/8/sys/x86/x86/local_apic.c
 Directory Properties:
   stable/8/sys/   (props changed)
   stable/8/sys/amd64/include/xen/   (props changed)
   stable/8/sys/cddl/contrib/opensolaris/   (props changed)
   stable/8/sys/contrib/dev/acpica/   (props changed)
   stable/8/sys/contrib/pf/   (props changed)
   stable/8/sys/dev/xen/xenpci/   (props changed)
 
 Modified: stable/8/sys/amd64/acpica/acpi_machdep.c
 ==============================================================================
 --- stable/8/sys/amd64/acpica/acpi_machdep.c	Fri Nov 19 09:26:39 2010	(r215512)
 +++ stable/8/sys/amd64/acpica/acpi_machdep.c	Fri Nov 19 09:49:14 2010	(r215513)
 @@ -32,6 +32,7 @@ __FBSDID("$FreeBSD$");
  #include <sys/kernel.h>
  #include <sys/module.h>
  #include <sys/sysctl.h>
 +
  #include <vm/vm.h>
  #include <vm/pmap.h>
  
 @@ -71,7 +72,6 @@ acpi_machdep_init(device_t dev)
  	STAILQ_INSERT_TAIL(&sc->apm_cdevs, &acpi_clone, entries);
  	ACPI_UNLOCK(acpi);
  	sc->acpi_clone = &acpi_clone;
 -	acpi_install_wakeup_handler(sc);
  
  	if (intr_model != ACPI_INTR_PIC)
  		acpi_SetIntrModel(intr_model);
 @@ -363,13 +363,20 @@ nexus_acpi_probe(device_t dev)
  static int
  nexus_acpi_attach(device_t dev)
  {
 +	device_t acpi_dev;
 +	int error;
  
  	nexus_init_resources();
  	bus_generic_probe(dev);
 -	if (BUS_ADD_CHILD(dev, 10, "acpi", 0) == NULL)
 +	acpi_dev = BUS_ADD_CHILD(dev, 10, "acpi", 0);
 +	if (acpi_dev == NULL)
  		panic("failed to add acpi0 device");
  
 -	return (bus_generic_attach(dev));
 +	error = bus_generic_attach(dev);
 +	if (error == 0)
 +		acpi_install_wakeup_handler(device_get_softc(acpi_dev));
 +
 +	return (error);
  }
  
  static device_method_t nexus_acpi_methods[] = {
 
 Modified: stable/8/sys/amd64/acpica/acpi_switch.S
 ==============================================================================
 --- stable/8/sys/amd64/acpica/acpi_switch.S	Fri Nov 19 09:26:39 2010	(r215512)
 +++ stable/8/sys/amd64/acpica/acpi_switch.S	Fri Nov 19 09:49:14 2010	(r215513)
 @@ -1,7 +1,7 @@
  /*-
   * Copyright (c) 2001 Takanori Watanabe <takawata@jp.freebsd.org>
   * Copyright (c) 2001 Mitsuru IWASAKI <iwasaki@jp.freebsd.org>
 - * Copyright (c) 2008-2009 Jung-uk Kim <jkim@FreeBSD.org>
 + * Copyright (c) 2008-2010 Jung-uk Kim <jkim@FreeBSD.org>
   * All rights reserved.
   *
   * Redistribution and use in source and binary forms, with or without
 @@ -34,26 +34,11 @@
  #include "acpi_wakedata.h"
  #include "assym.s"
  
 -#define	WAKEUP_DECL(member)	\
 -    .set WAKEUP_ ## member, wakeup_ ## member - wakeup_ctx
 -
 -	WAKEUP_DECL(xpcb)
 -	WAKEUP_DECL(gdt)
 -	WAKEUP_DECL(efer)
 -	WAKEUP_DECL(pat)
 -	WAKEUP_DECL(star)
 -	WAKEUP_DECL(lstar)
 -	WAKEUP_DECL(cstar)
 -	WAKEUP_DECL(sfmask)
 -	WAKEUP_DECL(cpu)
 -
 -#define	WAKEUP_CTX(member)	WAKEUP_ ## member (%rdi)
 -#define	WAKEUP_PCB(member)	PCB_ ## member(%r11)
 -#define	WAKEUP_XPCB(member)	XPCB_ ## member(%r11)
 +#define	WAKEUP_CTX(member)	wakeup_ ## member - wakeup_ctx(%rsi)
  
  ENTRY(acpi_restorecpu)
  	/* Switch to KPML4phys. */
 -	movq	%rsi, %rax
 +	movq	%rdi, %rax
  	movq	%rax, %cr3
  
  	/* Restore GDT. */
 @@ -62,7 +47,7 @@ ENTRY(acpi_restorecpu)
  1:
  
  	/* Fetch PCB. */
 -	movq	WAKEUP_CTX(xpcb), %r11
 +	movq	WAKEUP_CTX(pcb), %rdi
  
  	/* Force kernel segment registers. */
  	movl	$KDSEL, %eax
 @@ -75,16 +60,16 @@ ENTRY(acpi_restorecpu)
  	movw	%ax, %gs
  
  	movl	$MSR_FSBASE, %ecx
 -	movl	WAKEUP_PCB(FSBASE), %eax
 -	movl	4 + WAKEUP_PCB(FSBASE), %edx
 +	movl	PCB_FSBASE(%rdi), %eax
 +	movl	4 + PCB_FSBASE(%rdi), %edx
  	wrmsr
  	movl	$MSR_GSBASE, %ecx
 -	movl	WAKEUP_PCB(GSBASE), %eax
 -	movl	4 + WAKEUP_PCB(GSBASE), %edx
 +	movl	PCB_GSBASE(%rdi), %eax
 +	movl	4 + PCB_GSBASE(%rdi), %edx
  	wrmsr
  	movl	$MSR_KGSBASE, %ecx
 -	movl	WAKEUP_XPCB(KGSBASE), %eax
 -	movl	4 + WAKEUP_XPCB(KGSBASE), %edx
 +	movl	PCB_KGSBASE(%rdi), %eax
 +	movl	4 + PCB_KGSBASE(%rdi), %edx
  	wrmsr
  
  	/* Restore EFER. */
 @@ -115,17 +100,21 @@ ENTRY(acpi_restorecpu)
  	movl	WAKEUP_CTX(sfmask), %eax
  	wrmsr
  
 -	/* Restore CR0, CR2 and CR4. */
 -	movq	WAKEUP_XPCB(CR0), %rax
 +	/* Restore CR0 except for FPU mode. */
 +	movq	PCB_CR0(%rdi), %rax
 +	movq	%rax, %rcx
 +	andq	$~(CR0_EM | CR0_TS), %rax
  	movq	%rax, %cr0
 -	movq	WAKEUP_XPCB(CR2), %rax
 +
 +	/* Restore CR2 and CR4. */
 +	movq	PCB_CR2(%rdi), %rax
  	movq	%rax, %cr2
 -	movq	WAKEUP_XPCB(CR4), %rax
 +	movq	PCB_CR4(%rdi), %rax
  	movq	%rax, %cr4
  
  	/* Restore descriptor tables. */
 -	lidt	WAKEUP_XPCB(IDT)
 -	lldt	WAKEUP_XPCB(LDT)
 +	lidt	PCB_IDT(%rdi)
 +	lldt	PCB_LDT(%rdi)
  
  #define	SDT_SYSTSS	9
  #define	SDT_SYSBSY	11
 @@ -133,37 +122,44 @@ ENTRY(acpi_restorecpu)
  	/* Clear "task busy" bit and reload TR. */
  	movq	PCPU(TSS), %rax
  	andb	$(~SDT_SYSBSY | SDT_SYSTSS), 5(%rax)
 -	movw	WAKEUP_XPCB(TR), %ax
 +	movw	PCB_TR(%rdi), %ax
  	ltr	%ax
  
  #undef	SDT_SYSTSS
  #undef	SDT_SYSBSY
  
  	/* Restore other callee saved registers. */
 -	movq	WAKEUP_PCB(R15), %r15
 -	movq	WAKEUP_PCB(R14), %r14
 -	movq	WAKEUP_PCB(R13), %r13
 -	movq	WAKEUP_PCB(R12), %r12
 -	movq	WAKEUP_PCB(RBP), %rbp
 -	movq	WAKEUP_PCB(RSP), %rsp
 -	movq	WAKEUP_PCB(RBX), %rbx
 +	movq	PCB_R15(%rdi), %r15
 +	movq	PCB_R14(%rdi), %r14
 +	movq	PCB_R13(%rdi), %r13
 +	movq	PCB_R12(%rdi), %r12
 +	movq	PCB_RBP(%rdi), %rbp
 +	movq	PCB_RSP(%rdi), %rsp
 +	movq	PCB_RBX(%rdi), %rbx
  
  	/* Restore debug registers. */
 -	movq	WAKEUP_PCB(DR0), %rax
 +	movq	PCB_DR0(%rdi), %rax
  	movq	%rax, %dr0
 -	movq	WAKEUP_PCB(DR1), %rax
 +	movq	PCB_DR1(%rdi), %rax
  	movq	%rax, %dr1
 -	movq	WAKEUP_PCB(DR2), %rax
 +	movq	PCB_DR2(%rdi), %rax
  	movq	%rax, %dr2
 -	movq	WAKEUP_PCB(DR3), %rax
 +	movq	PCB_DR3(%rdi), %rax
  	movq	%rax, %dr3
 -	movq	WAKEUP_PCB(DR6), %rax
 +	movq	PCB_DR6(%rdi), %rax
  	movq	%rax, %dr6
 -	movq	WAKEUP_PCB(DR7), %rax
 +	movq	PCB_DR7(%rdi), %rax
  	movq	%rax, %dr7
  
 +	/* Restore FPU state. */
 +	fninit
 +	fxrstor	PCB_USERFPU(%rdi)
 +
 +	/* Reload CR0. */
 +	movq	%rcx, %cr0
 +
  	/* Restore return address. */
 -	movq	WAKEUP_PCB(RIP), %rax
 +	movq	PCB_RIP(%rdi), %rax
  	movq	%rax, (%rsp)
  
  	/* Indicate the CPU is resumed. */
 @@ -172,19 +168,3 @@ ENTRY(acpi_restorecpu)
  
  	ret
  END(acpi_restorecpu)
 -
 -ENTRY(acpi_savecpu)
 -	/* Fetch XPCB and save CPU context. */
 -	movq	%rdi, %r10
 -	call	savectx2
 -	movq	%r10, %r11
 -
 -	/* Patch caller's return address and stack pointer. */
 -	movq	(%rsp), %rax
 -	movq	%rax, WAKEUP_PCB(RIP)
 -	movq	%rsp, %rax
 -	movq	%rax, WAKEUP_PCB(RSP)
 -
 -	movl	$1, %eax
 -	ret
 -END(acpi_savecpu)
 
 Modified: stable/8/sys/amd64/acpica/acpi_wakecode.S
 ==============================================================================
 --- stable/8/sys/amd64/acpica/acpi_wakecode.S	Fri Nov 19 09:26:39 2010	(r215512)
 +++ stable/8/sys/amd64/acpica/acpi_wakecode.S	Fri Nov 19 09:49:14 2010	(r215513)
 @@ -2,7 +2,7 @@
   * Copyright (c) 2001 Takanori Watanabe <takawata@jp.freebsd.org>
   * Copyright (c) 2001 Mitsuru IWASAKI <iwasaki@jp.freebsd.org>
   * Copyright (c) 2003 Peter Wemm
 - * Copyright (c) 2008-2009 Jung-uk Kim <jkim@FreeBSD.org>
 + * Copyright (c) 2008-2010 Jung-uk Kim <jkim@FreeBSD.org>
   * All rights reserved.
   *
   * Redistribution and use in source and binary forms, with or without
 @@ -52,18 +52,17 @@
  	.data				/* So we can modify it */
  
  	ALIGN_TEXT
 -wakeup_start:
  	.code16
 +wakeup_start:
  	/*
  	 * Set up segment registers for real mode, a small stack for
  	 * any calls we make, and clear any flags.
  	 */
  	cli				/* make sure no interrupts */
 -	cld
  	mov	%cs, %ax		/* copy %cs to %ds.  Remember these */
  	mov	%ax, %ds		/* are offsets rather than selectors */
  	mov	%ax, %ss
 -	movw	$PAGE_SIZE - 8, %sp
 +	movw	$PAGE_SIZE, %sp
  	xorw	%ax, %ax
  	pushw	%ax
  	popfw
 @@ -127,6 +126,7 @@ wakeup_sw32:
  	/*
  	 * At this point, we are running in 32 bit legacy protected mode.
  	 */
 +	ALIGN_TEXT
  	.code32
  wakeup_32:
  
 @@ -205,8 +205,8 @@ wakeup_64:
  	mov	%ax, %ds
  
  	/* Restore arguments and return. */
 -	movq	wakeup_ctx - wakeup_start(%rbx), %rdi
 -	movq	wakeup_kpml4 - wakeup_start(%rbx), %rsi
 +	movq	wakeup_kpml4 - wakeup_start(%rbx), %rdi
 +	movq	wakeup_ctx - wakeup_start(%rbx), %rsi
  	movq	wakeup_retaddr - wakeup_start(%rbx), %rax
  	jmp	*%rax
  
 @@ -260,7 +260,7 @@ wakeup_kpml4:
  
  wakeup_ctx:
  	.quad	0
 -wakeup_xpcb:
 +wakeup_pcb:
  	.quad	0
  wakeup_gdt:
  	.word	0
 
 Modified: stable/8/sys/amd64/acpica/acpi_wakeup.c
 ==============================================================================
 --- stable/8/sys/amd64/acpica/acpi_wakeup.c	Fri Nov 19 09:26:39 2010	(r215512)
 +++ stable/8/sys/amd64/acpica/acpi_wakeup.c	Fri Nov 19 09:49:14 2010	(r215513)
 @@ -2,7 +2,7 @@
   * Copyright (c) 2001 Takanori Watanabe <takawata@jp.freebsd.org>
   * Copyright (c) 2001 Mitsuru IWASAKI <iwasaki@jp.freebsd.org>
   * Copyright (c) 2003 Peter Wemm
 - * Copyright (c) 2008-2009 Jung-uk Kim <jkim@FreeBSD.org>
 + * Copyright (c) 2008-2010 Jung-uk Kim <jkim@FreeBSD.org>
   * All rights reserved.
   *
   * Redistribution and use in source and binary forms, with or without
 @@ -31,13 +31,11 @@
  __FBSDID("$FreeBSD$");
  
  #include <sys/param.h>
 -#include <sys/systm.h>
  #include <sys/bus.h>
  #include <sys/kernel.h>
  #include <sys/malloc.h>
  #include <sys/memrange.h>
  #include <sys/smp.h>
 -#include <sys/types.h>
  
  #include <vm/vm.h>
  #include <vm/pmap.h>
 @@ -47,11 +45,11 @@ __FBSDID("$FreeBSD$");
  #include <machine/pcb.h>
  #include <machine/pmap.h>
  #include <machine/specialreg.h>
 -#include <machine/vmparam.h>
  
  #ifdef SMP
  #include <machine/apicreg.h>
  #include <machine/smp.h>
 +#include <machine/vmparam.h>
  #endif
  
  #include <contrib/dev/acpica/include/acpi.h>
 @@ -64,23 +62,18 @@ __FBSDID("$FreeBSD$");
  /* Make sure the code is less than a page and leave room for the stack. */
  CTASSERT(sizeof(wakecode) < PAGE_SIZE - 1024);
  
 -#ifndef _SYS_CDEFS_H_
 -#error this file needs sys/cdefs.h as a prerequisite
 -#endif
 -
  extern int		acpi_resume_beep;
  extern int		acpi_reset_video;
  
  #ifdef SMP
 -extern struct xpcb	*stopxpcbs;
 +extern struct pcb	**susppcbs;
  #else
 -static struct xpcb	*stopxpcbs;
 +static struct pcb	**susppcbs;
  #endif
  
 -int			acpi_restorecpu(struct xpcb *, vm_offset_t);
 -int			acpi_savecpu(struct xpcb *);
 +int			acpi_restorecpu(struct pcb *, vm_offset_t);
  
 -static void		acpi_alloc_wakeup_handler(void);
 +static void		*acpi_alloc_wakeup_handler(void);
  static void		acpi_stop_beep(void *);
  
  #ifdef SMP
 @@ -111,10 +104,10 @@ acpi_wakeup_ap(struct acpi_softc *sc, in
  	int		apic_id = cpu_apic_ids[cpu];
  	int		ms;
  
 -	WAKECODE_FIXUP(wakeup_xpcb, struct xpcb *, &stopxpcbs[cpu]);
 -	WAKECODE_FIXUP(wakeup_gdt, uint16_t, stopxpcbs[cpu].xpcb_gdt.rd_limit);
 +	WAKECODE_FIXUP(wakeup_pcb, struct pcb *, susppcbs[cpu]);
 +	WAKECODE_FIXUP(wakeup_gdt, uint16_t, susppcbs[cpu]->pcb_gdt.rd_limit);
  	WAKECODE_FIXUP(wakeup_gdt + 2, uint64_t,
 -	    stopxpcbs[cpu].xpcb_gdt.rd_base);
 +	    susppcbs[cpu]->pcb_gdt.rd_base);
  	WAKECODE_FIXUP(wakeup_cpu, int, cpu);
  
  	/* do an INIT IPI: assert RESET */
 @@ -222,7 +215,6 @@ acpi_wakeup_cpus(struct acpi_softc *sc, 
  int
  acpi_sleep_machdep(struct acpi_softc *sc, int state)
  {
 -	struct savefpu	*stopfpu;
  #ifdef SMP
  	cpumask_t	wakeup_cpus;
  #endif
 @@ -252,10 +244,7 @@ acpi_sleep_machdep(struct acpi_softc *sc
  	cr3 = rcr3();
  	load_cr3(KPML4phys);
  
 -	stopfpu = &stopxpcbs[0].xpcb_pcb.pcb_save;
 -	if (acpi_savecpu(&stopxpcbs[0])) {
 -		fpugetregs(curthread, stopfpu);
 -
 +	if (savectx(susppcbs[0])) {
  #ifdef SMP
  		if (wakeup_cpus != 0 && suspend_cpus(wakeup_cpus) == 0) {
  			device_printf(sc->acpi_dev,
 @@ -268,11 +257,11 @@ acpi_sleep_machdep(struct acpi_softc *sc
  		WAKECODE_FIXUP(resume_beep, uint8_t, (acpi_resume_beep != 0));
  		WAKECODE_FIXUP(reset_video, uint8_t, (acpi_reset_video != 0));
  
 -		WAKECODE_FIXUP(wakeup_xpcb, struct xpcb *, &stopxpcbs[0]);
 +		WAKECODE_FIXUP(wakeup_pcb, struct pcb *, susppcbs[0]);
  		WAKECODE_FIXUP(wakeup_gdt, uint16_t,
 -		    stopxpcbs[0].xpcb_gdt.rd_limit);
 +		    susppcbs[0]->pcb_gdt.rd_limit);
  		WAKECODE_FIXUP(wakeup_gdt + 2, uint64_t,
 -		    stopxpcbs[0].xpcb_gdt.rd_base);
 +		    susppcbs[0]->pcb_gdt.rd_base);
  		WAKECODE_FIXUP(wakeup_cpu, int, 0);
  
  		/* Call ACPICA to enter the desired sleep state */
 @@ -291,7 +280,6 @@ acpi_sleep_machdep(struct acpi_softc *sc
  		for (;;)
  			ia32_pause();
  	} else {
 -		fpusetregs(curthread, stopfpu);
  #ifdef SMP
  		if (wakeup_cpus != 0)
  			acpi_wakeup_cpus(sc, wakeup_cpus);
 @@ -324,49 +312,48 @@ out:
  	return (ret);
  }
  
 -static vm_offset_t	acpi_wakeaddr;
 -
 -static void
 +static void *
  acpi_alloc_wakeup_handler(void)
  {
  	void		*wakeaddr;
 -
 -	if (!cold)
 -		return;
 +	int		i;
  
  	/*
  	 * Specify the region for our wakeup code.  We want it in the low 1 MB
 -	 * region, excluding video memory and above (0xa0000).  We ask for
 -	 * it to be page-aligned, just to be safe.
 +	 * region, excluding real mode IVT (0-0x3ff), BDA (0x400-0x4ff), EBDA
 +	 * (less than 128KB, below 0xa0000, must be excluded by SMAP and DSDT),
 +	 * and ROM area (0xa0000 and above).  The temporary page tables must be
 +	 * page-aligned.
  	 */
 -	wakeaddr = contigmalloc(4 * PAGE_SIZE, M_DEVBUF, M_NOWAIT, 0, 0x9ffff,
 -	    PAGE_SIZE, 0ul);
 +	wakeaddr = contigmalloc(4 * PAGE_SIZE, M_DEVBUF, M_NOWAIT, 0x500,
 +	    0xa0000, PAGE_SIZE, 0ul);
  	if (wakeaddr == NULL) {
  		printf("%s: can't alloc wake memory\n", __func__);
 -		return;
 -	}
 -	stopxpcbs = malloc(mp_ncpus * sizeof(*stopxpcbs), M_DEVBUF, M_NOWAIT);
 -	if (stopxpcbs == NULL) {
 -		contigfree(wakeaddr, 4 * PAGE_SIZE, M_DEVBUF);
 -		printf("%s: can't alloc CPU state memory\n", __func__);
 -		return;
 +		return (NULL);
  	}
 -	acpi_wakeaddr = (vm_offset_t)wakeaddr;
 -}
 +	susppcbs = malloc(mp_ncpus * sizeof(*susppcbs), M_DEVBUF, M_WAITOK);
 +	for (i = 0; i < mp_ncpus; i++)
 +		susppcbs[i] = malloc(sizeof(**susppcbs), M_DEVBUF, M_WAITOK);
  
 -SYSINIT(acpiwakeup, SI_SUB_KMEM, SI_ORDER_ANY, acpi_alloc_wakeup_handler, 0);
 +	return (wakeaddr);
 +}
  
  void
  acpi_install_wakeup_handler(struct acpi_softc *sc)
  {
 +	static void	*wakeaddr = NULL;
  	uint64_t	*pt4, *pt3, *pt2;
  	int		i;
  
 -	if (acpi_wakeaddr == 0ul)
 +	if (wakeaddr != NULL)
 +		return;
 +
 +	wakeaddr = acpi_alloc_wakeup_handler();
 +	if (wakeaddr == NULL)
  		return;
  
 -	sc->acpi_wakeaddr = acpi_wakeaddr;
 -	sc->acpi_wakephys = vtophys(acpi_wakeaddr);
 +	sc->acpi_wakeaddr = (vm_offset_t)wakeaddr;
 +	sc->acpi_wakephys = vtophys(wakeaddr);
  
  	bcopy(wakecode, (void *)WAKECODE_VADDR(sc), sizeof(wakecode));
  
 @@ -392,7 +379,7 @@ acpi_install_wakeup_handler(struct acpi_
  	WAKECODE_FIXUP(wakeup_sfmask, uint64_t, rdmsr(MSR_SF_MASK));
  
  	/* Build temporary page tables below realmode code. */
 -	pt4 = (uint64_t *)acpi_wakeaddr;
 +	pt4 = wakeaddr;
  	pt3 = pt4 + (PAGE_SIZE) / sizeof(uint64_t);
  	pt2 = pt3 + (PAGE_SIZE) / sizeof(uint64_t);
  
 
 Modified: stable/8/sys/amd64/amd64/cpu_switch.S
 ==============================================================================
 --- stable/8/sys/amd64/amd64/cpu_switch.S	Fri Nov 19 09:26:39 2010	(r215512)
 +++ stable/8/sys/amd64/amd64/cpu_switch.S	Fri Nov 19 09:49:14 2010	(r215513)
 @@ -113,7 +113,7 @@ done_store_dr:
  	/* have we used fp, and need a save? */
  	cmpq	%rdi,PCPU(FPCURTHREAD)
  	jne	1f
 -	addq	$PCB_SAVEFPU,%r8
 +	movq	PCB_SAVEFPU(%r8),%r8
  	clts
  	fxsave	(%r8)
  	smsw	%ax
 @@ -302,121 +302,62 @@ END(cpu_switch)
   * Update pcb, saving current processor state.
   */
  ENTRY(savectx)
 -	/* Fetch PCB. */
 -	movq	%rdi,%rcx
 -
 -	/* Save caller's return address. */
 -	movq	(%rsp),%rax
 -	movq	%rax,PCB_RIP(%rcx)
 -
 -	movq	%cr3,%rax
 -	movq	%rax,PCB_CR3(%rcx)
 -
 -	movq	%rbx,PCB_RBX(%rcx)
 -	movq	%rsp,PCB_RSP(%rcx)
 -	movq	%rbp,PCB_RBP(%rcx)
 -	movq	%r12,PCB_R12(%rcx)
 -	movq	%r13,PCB_R13(%rcx)
 -	movq	%r14,PCB_R14(%rcx)
 -	movq	%r15,PCB_R15(%rcx)
 -
 -	/*
 -	 * If fpcurthread == NULL, then the fpu h/w state is irrelevant and the
 -	 * state had better already be in the pcb.  This is true for forks
 -	 * but not for dumps (the old book-keeping with FP flags in the pcb
 -	 * always lost for dumps because the dump pcb has 0 flags).
 -	 *
 -	 * If fpcurthread != NULL, then we have to save the fpu h/w state to
 -	 * fpcurthread's pcb and copy it to the requested pcb, or save to the
 -	 * requested pcb and reload.  Copying is easier because we would
 -	 * have to handle h/w bugs for reloading.  We used to lose the
 -	 * parent's fpu state for forks by forgetting to reload.
 -	 */
 -	pushfq
 -	cli
 -	movq	PCPU(FPCURTHREAD),%rax
 -	testq	%rax,%rax
 -	je	1f
 -
 -	movq	TD_PCB(%rax),%rdi
 -	leaq	PCB_SAVEFPU(%rdi),%rdi
 -	clts
 -	fxsave	(%rdi)
 -	smsw	%ax
 -	orb	$CR0_TS,%al
 -	lmsw	%ax
 -
 -	movq	$PCB_SAVEFPU_SIZE,%rdx	/* arg 3 */
 -	leaq	PCB_SAVEFPU(%rcx),%rsi	/* arg 2 */
 -	/* arg 1 (%rdi) already loaded */
 -	call	bcopy
 -1:
 -	popfq
 -
 -	ret
 -END(savectx)
 -
 -/*
 - * savectx2(xpcb)
 - * Update xpcb, saving current processor state.
 - */
 -ENTRY(savectx2)
 -	/* Fetch XPCB. */
 -	movq	%rdi,%r8
 -
  	/* Save caller's return address. */
  	movq	(%rsp),%rax
 -	movq	%rax,PCB_RIP(%r8)
 +	movq	%rax,PCB_RIP(%rdi)
  
 -	movq	%rbx,PCB_RBX(%r8)
 -	movq	%rsp,PCB_RSP(%r8)
 -	movq	%rbp,PCB_RBP(%r8)
 -	movq	%r12,PCB_R12(%r8)
 -	movq	%r13,PCB_R13(%r8)
 -	movq	%r14,PCB_R14(%r8)
 -	movq	%r15,PCB_R15(%r8)
 +	movq	%rbx,PCB_RBX(%rdi)
 +	movq	%rsp,PCB_RSP(%rdi)
 +	movq	%rbp,PCB_RBP(%rdi)
 +	movq	%r12,PCB_R12(%rdi)
 +	movq	%r13,PCB_R13(%rdi)
 +	movq	%r14,PCB_R14(%rdi)
 +	movq	%r15,PCB_R15(%rdi)
  
 -	movq	%cr0,%rax
 -	movq	%rax,XPCB_CR0(%r8)
 +	movq	%cr0,%rsi
 +	movq	%rsi,PCB_CR0(%rdi)
  	movq	%cr2,%rax
 -	movq	%rax,XPCB_CR2(%r8)
 +	movq	%rax,PCB_CR2(%rdi)
 +	movq	%cr3,%rax
 +	movq	%rax,PCB_CR3(%rdi)
  	movq	%cr4,%rax
 -	movq	%rax,XPCB_CR4(%r8)
 +	movq	%rax,PCB_CR4(%rdi)
  
  	movq	%dr0,%rax
 -	movq	%rax,PCB_DR0(%r8)
 +	movq	%rax,PCB_DR0(%rdi)
  	movq	%dr1,%rax
 -	movq	%rax,PCB_DR1(%r8)
 +	movq	%rax,PCB_DR1(%rdi)
  	movq	%dr2,%rax
 -	movq	%rax,PCB_DR2(%r8)
 +	movq	%rax,PCB_DR2(%rdi)
  	movq	%dr3,%rax
 -	movq	%rax,PCB_DR3(%r8)
 +	movq	%rax,PCB_DR3(%rdi)
  	movq	%dr6,%rax
 -	movq	%rax,PCB_DR6(%r8)
 +	movq	%rax,PCB_DR6(%rdi)
  	movq	%dr7,%rax
 -	movq	%rax,PCB_DR7(%r8)
 -
 -	sgdt	XPCB_GDT(%r8)
 -	sidt	XPCB_IDT(%r8)
 -	sldt	XPCB_LDT(%r8)
 -	str	XPCB_TR(%r8)
 +	movq	%rax,PCB_DR7(%rdi)
  
  	movl	$MSR_FSBASE,%ecx
  	rdmsr
 -	shlq	$32,%rdx
 -	leaq	(%rax,%rdx),%rax
 -	movq	%rax,PCB_FSBASE(%r8)
 +	movl	%eax,PCB_FSBASE(%rdi)
 +	movl	%edx,PCB_FSBASE+4(%rdi)
  	movl	$MSR_GSBASE,%ecx
  	rdmsr
 -	shlq	$32,%rdx
 -	leaq	(%rax,%rdx),%rax
 -	movq	%rax,PCB_GSBASE(%r8)
 +	movl	%eax,PCB_GSBASE(%rdi)
 +	movl	%edx,PCB_GSBASE+4(%rdi)
  	movl	$MSR_KGSBASE,%ecx
  	rdmsr
 -	shlq	$32,%rdx
 -	leaq	(%rax,%rdx),%rax
 -	movq	%rax,XPCB_KGSBASE(%r8)
 +	movl	%eax,PCB_KGSBASE(%rdi)
 +	movl	%edx,PCB_KGSBASE+4(%rdi)
 +
 +	sgdt	PCB_GDT(%rdi)
 +	sidt	PCB_IDT(%rdi)
 +	sldt	PCB_LDT(%rdi)
 +	str	PCB_TR(%rdi)
  
 -	movl	$1, %eax
 +	clts
 +	fxsave	PCB_USERFPU(%rdi)
 +	movq	%rsi,%cr0	/* The previous %cr0 is saved in %rsi. */
 +
 +	movl	$1,%eax
  	ret
 -END(savectx2)
 +END(savectx)
 
 Modified: stable/8/sys/amd64/amd64/fpu.c
 ==============================================================================
 --- stable/8/sys/amd64/amd64/fpu.c	Fri Nov 19 09:26:39 2010	(r215512)
 +++ stable/8/sys/amd64/amd64/fpu.c	Fri Nov 19 09:49:14 2010	(r215513)
 @@ -65,34 +65,36 @@ __FBSDID("$FreeBSD$");
  
  #if defined(__GNUCLIKE_ASM) && !defined(lint)
  
 -#define	fldcw(addr)		__asm("fldcw %0" : : "m" (*(addr)))
 -#define	fnclex()		__asm("fnclex")
 -#define	fninit()		__asm("fninit")
 +#define	fldcw(cw)		__asm __volatile("fldcw %0" : : "m" (cw))
 +#define	fnclex()		__asm __volatile("fnclex")
 +#define	fninit()		__asm __volatile("fninit")
  #define	fnstcw(addr)		__asm __volatile("fnstcw %0" : "=m" (*(addr)))
 -#define	fnstsw(addr)		__asm __volatile("fnstsw %0" : "=m" (*(addr)))
 -#define	fxrstor(addr)		__asm("fxrstor %0" : : "m" (*(addr)))
 +#define	fnstsw(addr)		__asm __volatile("fnstsw %0" : "=am" (*(addr)))
 +#define	fxrstor(addr)		__asm __volatile("fxrstor %0" : : "m" (*(addr)))
  #define	fxsave(addr)		__asm __volatile("fxsave %0" : "=m" (*(addr)))
 -#define	ldmxcsr(r)		__asm __volatile("ldmxcsr %0" : : "m" (r))
 -#define	start_emulating()	__asm("smsw %%ax; orb %0,%%al; lmsw %%ax" \
 -				      : : "n" (CR0_TS) : "ax")
 -#define	stop_emulating()	__asm("clts")
 +#define	ldmxcsr(csr)		__asm __volatile("ldmxcsr %0" : : "m" (csr))
 +#define	start_emulating()	__asm __volatile( \
 +				    "smsw %%ax; orb %0,%%al; lmsw %%ax" \
 +				    : : "n" (CR0_TS) : "ax")
 +#define	stop_emulating()	__asm __volatile("clts")
  
  #else	/* !(__GNUCLIKE_ASM && !lint) */
  
 -void	fldcw(caddr_t addr);
 +void	fldcw(u_short cw);
  void	fnclex(void);
  void	fninit(void);
  void	fnstcw(caddr_t addr);
  void	fnstsw(caddr_t addr);
  void	fxsave(caddr_t addr);
  void	fxrstor(caddr_t addr);
 +void	ldmxcsr(u_int csr);
  void	start_emulating(void);
  void	stop_emulating(void);
  
  #endif	/* __GNUCLIKE_ASM && !lint */
  
 -#define GET_FPU_CW(thread) ((thread)->td_pcb->pcb_save.sv_env.en_cw)
 -#define GET_FPU_SW(thread) ((thread)->td_pcb->pcb_save.sv_env.en_sw)
 +#define GET_FPU_CW(thread) ((thread)->td_pcb->pcb_save->sv_env.en_cw)
 +#define GET_FPU_SW(thread) ((thread)->td_pcb->pcb_save->sv_env.en_sw)
  
  typedef u_char bool_t;
  
 @@ -111,15 +113,18 @@ static	struct savefpu		fpu_initialstate;
  void
  fpuinit(void)
  {
 -	register_t savecrit;
 +	register_t saveintr;
  	u_int mxcsr;
  	u_short control;
  
 -	savecrit = intr_disable();
 +	/*
 +	 * It is too early for critical_enter() to work on AP.
 +	 */
 +	saveintr = intr_disable();
  	stop_emulating();
  	fninit();
  	control = __INITIAL_FPUCW__;
 -	fldcw(&control);
 +	fldcw(control);
  	mxcsr = __INITIAL_MXCSR__;
  	ldmxcsr(mxcsr);
  	if (PCPU_GET(cpuid) == 0) {
 @@ -132,7 +137,7 @@ fpuinit(void)
  		bzero(fpu_initialstate.sv_xmm, sizeof(fpu_initialstate.sv_xmm));
  	}
  	start_emulating();
 -	intr_restore(savecrit);
 +	intr_restore(saveintr);
  }
  
  /*
 @@ -141,16 +146,15 @@ fpuinit(void)
  void
  fpuexit(struct thread *td)
  {
 -	register_t savecrit;
  
 -	savecrit = intr_disable();
 +	critical_enter();
  	if (curthread == PCPU_GET(fpcurthread)) {
  		stop_emulating();
 -		fxsave(&PCPU_GET(curpcb)->pcb_save);
 +		fxsave(PCPU_GET(curpcb)->pcb_save);
  		start_emulating();
  		PCPU_SET(fpcurthread, 0);
  	}
 -	intr_restore(savecrit);
 +	critical_exit();
  }
  
  int
 @@ -351,10 +355,9 @@ static char fpetable[128] = {
  int
  fputrap()
  {
 -	register_t savecrit;
  	u_short control, status;
  
 -	savecrit = intr_disable();
 +	critical_enter();
  
  	/*
  	 * Interrupt handling (for another interrupt) may have pushed the
 @@ -371,7 +374,7 @@ fputrap()
  
  	if (PCPU_GET(fpcurthread) == curthread)
  		fnclex();
 -	intr_restore(savecrit);
 +	critical_exit();
  	return (fpetable[status & ((~control & 0x3f) | 0x40)]);
  }
  
 @@ -389,12 +392,13 @@ void
  fpudna(void)
  {
  	struct pcb *pcb;
 -	register_t s;
  
 +	critical_enter();
  	if (PCPU_GET(fpcurthread) == curthread) {
  		printf("fpudna: fpcurthread == curthread %d times\n",
  		    ++err_count);
  		stop_emulating();
 +		critical_exit();
  		return;
  	}
  	if (PCPU_GET(fpcurthread) != NULL) {
 @@ -404,7 +408,6 @@ fpudna(void)
  		       curthread, curthread->td_proc->p_pid);
  		panic("fpudna");
  	}
 -	s = intr_disable();
  	stop_emulating();
  	/*
  	 * Record new context early in case frstor causes a trap.
 @@ -422,23 +425,23 @@ fpudna(void)
  		 */
  		fxrstor(&fpu_initialstate);
  		if (pcb->pcb_initial_fpucw != __INITIAL_FPUCW__)
 -			fldcw(&pcb->pcb_initial_fpucw);
 +			fldcw(pcb->pcb_initial_fpucw);
  		pcb->pcb_flags |= PCB_FPUINITDONE;
 +		if (PCB_USER_FPU(pcb))
 +			pcb->pcb_flags |= PCB_USERFPUINITDONE;
  	} else
 -		fxrstor(&pcb->pcb_save);
 -	intr_restore(s);
 +		fxrstor(pcb->pcb_save);
 +	critical_exit();
  }
  
 -/*
 - * This should be called with interrupts disabled and only when the owning
 - * FPU thread is non-null.
 - */
  void
  fpudrop()
  {
  	struct thread *td;
  
  	td = PCPU_GET(fpcurthread);
 +	KASSERT(td == curthread, ("fpudrop: fpcurthread != curthread"));
 +	CRITICAL_ASSERT(td);
  	PCPU_SET(fpcurthread, NULL);
  	td->td_pcb->pcb_flags &= ~PCB_FPUINITDONE;
  	start_emulating();
 @@ -449,23 +452,47 @@ fpudrop()
   * It returns the FPU ownership status.
   */
  int
 +fpugetuserregs(struct thread *td, struct savefpu *addr)
 +{
 +	struct pcb *pcb;
 +
 +	pcb = td->td_pcb;
 +	if ((pcb->pcb_flags & PCB_USERFPUINITDONE) == 0) {
 +		bcopy(&fpu_initialstate, addr, sizeof(fpu_initialstate));
 +		addr->sv_env.en_cw = pcb->pcb_initial_fpucw;
 +		return (_MC_FPOWNED_NONE);
 +	}
 +	critical_enter();
 +	if (td == PCPU_GET(fpcurthread) && PCB_USER_FPU(pcb)) {
 +		fxsave(addr);
 +		critical_exit();
 +		return (_MC_FPOWNED_FPU);
 +	} else {
 +		critical_exit();
 +		bcopy(&pcb->pcb_user_save, addr, sizeof(*addr));
 +		return (_MC_FPOWNED_PCB);
 +	}
 +}
 +
 +int
  fpugetregs(struct thread *td, struct savefpu *addr)
  {
 -	register_t s;
 +	struct pcb *pcb;
  
 -	if ((td->td_pcb->pcb_flags & PCB_FPUINITDONE) == 0) {
 +	pcb = td->td_pcb;
 +	if ((pcb->pcb_flags & PCB_FPUINITDONE) == 0) {
  		bcopy(&fpu_initialstate, addr, sizeof(fpu_initialstate));
 -		addr->sv_env.en_cw = td->td_pcb->pcb_initial_fpucw;
 +		addr->sv_env.en_cw = pcb->pcb_initial_fpucw;
  		return (_MC_FPOWNED_NONE);
  	}
 -	s = intr_disable();
 +	critical_enter();
  	if (td == PCPU_GET(fpcurthread)) {
  		fxsave(addr);
 -		intr_restore(s);
 +		critical_exit();
  		return (_MC_FPOWNED_FPU);
  	} else {
 -		intr_restore(s);
 -		bcopy(&td->td_pcb->pcb_save, addr, sizeof(*addr));
 +		critical_exit();
 +		bcopy(pcb->pcb_save, addr, sizeof(*addr));
  		return (_MC_FPOWNED_PCB);
  	}
  }
 @@ -474,19 +501,42 @@ fpugetregs(struct thread *td, struct sav
   * Set the state of the FPU.
   */
  void
 +fpusetuserregs(struct thread *td, struct savefpu *addr)
 +{
 +	struct pcb *pcb;
 +
 +	pcb = td->td_pcb;
 +	critical_enter();
 +	if (td == PCPU_GET(fpcurthread) && PCB_USER_FPU(pcb)) {
 +		fxrstor(addr);
 +		critical_exit();
 +		pcb->pcb_flags |= PCB_FPUINITDONE | PCB_USERFPUINITDONE;
 +	} else {
 +		critical_exit();
 +		bcopy(addr, &td->td_pcb->pcb_user_save, sizeof(*addr));
 +		if (PCB_USER_FPU(pcb))
 +			pcb->pcb_flags |= PCB_FPUINITDONE;
 +		pcb->pcb_flags |= PCB_USERFPUINITDONE;
 +	}
 +}
 +
 +void
  fpusetregs(struct thread *td, struct savefpu *addr)
  {
 -	register_t s;
 +	struct pcb *pcb;
  
 -	s = intr_disable();
 +	pcb = td->td_pcb;
 +	critical_enter();
  	if (td == PCPU_GET(fpcurthread)) {
  		fxrstor(addr);
 -		intr_restore(s);
 +		critical_exit();
  	} else {
 -		intr_restore(s);
 -		bcopy(addr, &td->td_pcb->pcb_save, sizeof(*addr));
 +		critical_exit();
 +		bcopy(addr, td->td_pcb->pcb_save, sizeof(*addr));
  	}
 -	curthread->td_pcb->pcb_flags |= PCB_FPUINITDONE;
 +	if (PCB_USER_FPU(pcb))
 +		pcb->pcb_flags |= PCB_USERFPUINITDONE;
 +	pcb->pcb_flags |= PCB_FPUINITDONE;
  }
  
  /*
 @@ -575,3 +625,73 @@ static devclass_t fpupnp_devclass;
  
  DRIVER_MODULE(fpupnp, acpi, fpupnp_driver, fpupnp_devclass, 0, 0);
  #endif	/* DEV_ISA */
 +
 +int
 +fpu_kern_enter(struct thread *td, struct fpu_kern_ctx *ctx, u_int flags)
 +{
 +	struct pcb *pcb;
 +
 +	pcb = td->td_pcb;
 +	KASSERT(!PCB_USER_FPU(pcb) || pcb->pcb_save == &pcb->pcb_user_save,
 +	    ("mangled pcb_save"));
 +	ctx->flags = 0;
 +	if ((pcb->pcb_flags & PCB_FPUINITDONE) != 0)
 +		ctx->flags |= FPU_KERN_CTX_FPUINITDONE;
 +	fpuexit(td);
 +	ctx->prev = pcb->pcb_save;
 +	pcb->pcb_save = &ctx->hwstate;
 +	pcb->pcb_flags |= PCB_KERNFPU;
 +	pcb->pcb_flags &= ~PCB_FPUINITDONE;
 +	return (0);
 +}
 +
 +int
 +fpu_kern_leave(struct thread *td, struct fpu_kern_ctx *ctx)
 +{
 +	struct pcb *pcb;
 +
 +	pcb = td->td_pcb;
 +	critical_enter();
 +	if (curthread == PCPU_GET(fpcurthread))
 +		fpudrop();
 +	critical_exit();
 +	pcb->pcb_save = ctx->prev;
 +	if (pcb->pcb_save == &pcb->pcb_user_save) {
 +		if ((pcb->pcb_flags & PCB_USERFPUINITDONE) != 0)
 +			pcb->pcb_flags |= PCB_FPUINITDONE;
 +		else
 +			pcb->pcb_flags &= ~PCB_FPUINITDONE;
 +		pcb->pcb_flags &= ~PCB_KERNFPU;
 +	} else {
 +		if ((ctx->flags & FPU_KERN_CTX_FPUINITDONE) != 0)
 +			pcb->pcb_flags |= PCB_FPUINITDONE;
 +		else
 +			pcb->pcb_flags &= ~PCB_FPUINITDONE;
 +		KASSERT(!PCB_USER_FPU(pcb), ("unpaired fpu_kern_leave"));
 +	}
 +	return (0);
 +}
 +
 +int
 +fpu_kern_thread(u_int flags)
 +{
 +	struct pcb *pcb;
 +
 +	pcb = PCPU_GET(curpcb);
 +	KASSERT((curthread->td_pflags & TDP_KTHREAD) != 0,
 +	    ("Only kthread may use fpu_kern_thread"));
 +	KASSERT(pcb->pcb_save == &pcb->pcb_user_save, ("mangled pcb_save"));
 +	KASSERT(PCB_USER_FPU(pcb), ("recursive call"));
 +
 +	pcb->pcb_flags |= PCB_KERNFPU;
 +	return (0);
 
 *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 
State-Changed-From-To: open->closed 
State-Changed-By: kib 
State-Changed-When: Fri Nov 19 14:14:05 UTC 2010 
State-Changed-Why:  
Merged to RELENG_8. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=135014 
>Unformatted:
