From gemini@geminix.org  Tue Aug 17 19:40:05 2004
Return-Path: <gemini@geminix.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id BF72D16A4CE
	for <FreeBSD-gnats-submit@freebsd.org>; Tue, 17 Aug 2004 19:40:05 +0000 (GMT)
Received: from gen129.n001.c02.escapebox.net (gen129.n001.c02.escapebox.net [213.73.91.129])
	by mx1.FreeBSD.org (Postfix) with ESMTP id CC07E43D53
	for <FreeBSD-gnats-submit@freebsd.org>; Tue, 17 Aug 2004 19:40:04 +0000 (GMT)
	(envelope-from gemini@geminix.org)
Received: from gemini by geminix.org with local (Exim 3.36 #1)
	id 1Bx9oP-0006aG-00; Tue, 17 Aug 2004 21:40:01 +0200
Message-Id: <E1Bx9oP-0006aG-00@geminix.org>
Date: Tue, 17 Aug 2004 21:40:01 +0200
From: Uwe Doering <gemini@geminix.org>
Reply-To: Uwe Doering <gemini@geminix.org>
To: FreeBSD-gnats-submit@freebsd.org
Cc: Uwe Doering <gemini@geminix.org>
Subject: NULL pointer dereference in vm_pageout_scan()
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         70587
>Category:       kern
>Synopsis:       [vm] [patch] NULL pointer dereference in vm_pageout_scan()
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    alc
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Aug 17 19:40:21 GMT 2004
>Closed-Date:    Thu Nov 22 21:31:47 UTC 2007
>Last-Modified:  Thu Nov 22 21:31:47 UTC 2007
>Originator:     Uwe Doering
>Release:        FreeBSD 4.5-RELEASE i386
>Organization:
EscapeBox - Managed On-Demand UNIX Servers
http://www.escapebox.net

>Environment:
System: FreeBSD geminix.org 4.5-RELEASE FreeBSD 4.5-RELEASE #1: Sun Aug 15 10:59:08 GMT 2004 root@localhost:/STABLE_Enhanced_Edition i386


>Description:
A couple of days ago one of our normally extremely stable server
machines panicked due to a NULL pointer dereference.  While we
didn't get a kernel dump we at least had the instruction pointer
and the offending data address.

After disassembling the respective part of the kernel it became
clear that the pointer in the 'object' field of the relevant
'vm_page_t' structure was NULL at the time and was beeing used
without checking it for NULL first.  Here's the section of code
where it happened (in vm_pageout.c:vm_pageout_scan()):

        /*
         * If the object is not being used, we ignore previous 
         * references.
         */
        if (m->object->ref_count == 0) {
                vm_page_flag_clear(m, PG_REFERENCED);
                pmap_clear_reference(m);

Now, the original assumption when this code had been written
may well have been that it can never happen that a page on the
inactive queue is _not_ associated with an object.  The crash
we experienced unfortunately proves the opposite.  And we also
found that other parts of the kernel certainly don't trust the
'object' field blindly.

>How-To-Repeat:
I have no idea how to repeat that condition.  We are running
several servers for over two years in production now, and this
was the first time it happend to us.  I speculate that the
'object' field being NULL is just a transitory state that
became apparent due to a race condition.  Otherwise it should
have hit us more frequently in the past.

>Fix:
Please consider adopting the patch below.  We take the pragmatic
approach and skip the page if it isn't associated with an object,
on the assumption that this state will be short-lived, and also
because in this context we wouldn't know what to do with a page
like this, anyway.  The patch deals with the scanning loops for
both the inactive and active queue.

--- vm_pageout.c.diff begins here ---
--- src/sys/vm/vm_pageout.c.orig	Mon Mar 11 16:48:15 2002
+++ src/sys/vm/vm_pageout.c	Mon Aug  2 13:30:57 2004
@@ -704,7 +704,7 @@
 		/*
 		 * A held page may be undergoing I/O, so skip it.
 		 */
-		if (m->hold_count) {
+		if (m->hold_count || m->object == NULL) {
 			s = splvm();
 			TAILQ_REMOVE(&vm_page_queues[PQ_INACTIVE].pl, m, pageq);
 			TAILQ_INSERT_TAIL(&vm_page_queues[PQ_INACTIVE].pl, m, pageq);
@@ -988,7 +988,8 @@
 		 */
 		if ((m->busy != 0) ||
 		    (m->flags & PG_BUSY) ||
-		    (m->hold_count != 0)) {
+		    (m->hold_count != 0) ||
+		    (m->object == NULL)) {
 			s = splvm();
 			TAILQ_REMOVE(&vm_page_queues[PQ_ACTIVE].pl, m, pageq);
 			TAILQ_INSERT_TAIL(&vm_page_queues[PQ_ACTIVE].pl, m, pageq);
--- vm_pageout.c.diff ends here ---
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback 
State-Changed-By: kmacy 
State-Changed-When: Fri Nov 16 17:27:58 UTC 2007 
State-Changed-Why:  

This sounds more like a transitory bad memory issue. Have you seen this in recent releases? 

http://www.freebsd.org/cgi/query-pr.cgi?pr=70587 
State-Changed-From-To: feedback->open 
State-Changed-By: kmacy 
State-Changed-When: Fri Nov 16 20:35:57 UTC 2007 
State-Changed-Why:  

Toss this over to alc to see if it is worth applying or should just be closed. 


Responsible-Changed-From-To: freebsd-bugs->alc 
Responsible-Changed-By: kmacy 
Responsible-Changed-When: Fri Nov 16 20:35:57 UTC 2007 
Responsible-Changed-Why:  

Toss this over to alc to see if it is worth applying or should just be closed.  

http://www.freebsd.org/cgi/query-pr.cgi?pr=70587 
State-Changed-From-To: open->closed 
State-Changed-By: alc 
State-Changed-When: Thu Nov 22 21:18:00 UTC 2007 
State-Changed-Why:  
Indeed, it is an error for a page to appear in either the active 
or inactive queues without belonging to an object.  As suggested 
in the comments this must have been either a synchronization 
error in some part of the kernel not modified by the enclosed 
patch or a transient hardware error.  Since (1) the patch does 
not identify the source of the error but only masks the error, 
(2) the synchronization of access to vm objects and vm page 
queues has completely changed in RELENG_5 and beyond, and (3) 
there have been no reports of this bug since then, I am going 
to close this PR without applying the provided patch.  That 
said, I still want to thank the submitter for his efforts. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=70587 
>Unformatted:
