From nobody@FreeBSD.org  Mon Jul  7 21:59:14 2008
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 58AD0106566C
	for <freebsd-gnats-submit@FreeBSD.org>; Mon,  7 Jul 2008 21:59:14 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21])
	by mx1.freebsd.org (Postfix) with ESMTP id 475A78FC19
	for <freebsd-gnats-submit@FreeBSD.org>; Mon,  7 Jul 2008 21:59:14 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.14.2/8.14.2) with ESMTP id m67LxEXO002482
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 7 Jul 2008 21:59:14 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.14.2/8.14.1/Submit) id m67LxDnd002481;
	Mon, 7 Jul 2008 21:59:13 GMT
	(envelope-from nobody)
Message-Id: <200807072159.m67LxDnd002481@www.freebsd.org>
Date: Mon, 7 Jul 2008 21:59:13 GMT
From: Andrew Hammond <andrew.george.hammond@gmail.com>
To: freebsd-gnats-submit@FreeBSD.org
Subject: ENOSPC may be misleading, consider EIO
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         125382
>Category:       kern
>Synopsis:       [libc] open(2): ENOSPC may be misleading, consider EIO
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jul 07 22:00:01 UTC 2008
>Closed-Date:    
>Last-Modified:  Sun Jul 20 20:15:40 UTC 2008
>Originator:     Andrew Hammond
>Release:        6.2 amd64
>Organization:
AdECN, a Microsoft Company
>Environment:
FreeBSD db1.sjc.adecn.com 6.2-RELEASE-p6 FreeBSD 6.2-RELEASE-p6 #1: Thu Jul 19 09:21:10 PDT 2007     root@qaipc1.qa1.adecn.com:/usr/obj/usr/src/sys/ADECNDB  amd64
>Description:
Found the following error message in PostgreSQL logs:

vacuumdb: vacuuming of database "adecndb" failed: ERROR:  could not
write block 209610 of relation 1663/16386/236356665: No space left on
device

Didn't make sense since device is only at 18% usage. Got on pgsql-hackers
mailing list (subject "the un-vacuumable table", thread starts at
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00922.php).

> Have you looked into the machine's kernel log to see if there is any
> evidence of low-level distress (hardware or filesystem level)?  I'm
> wondering if ENOSPC is being reported because it is the closest
> available errno code, but the real problem is something different than
> the error message text suggests.  Other than the errno the symptoms
> all look quite a bit like a bad-sector problem ...

Uhm, just for the record FileWrite returns error messages which get printed
this way for two reasons other than write(2) returning ENOSPC:

1) if FileAccess has to reopen the file then open(2) could return an error. I
don't see how open returns ENOSPC without O_CREAT (and that's cleared for
reopening)

2) If write(2) returns < 0 but doesn't set errno. That also seems like a
strange case that shouldn't happen, but perhaps there's some reason it can.



On Thu, Jul 3, 2008 at 10:57 PM, Andrew Hammond
<andrew.george.hammond@gmail.com> wrote:
> On Thu, Jul 3, 2008 at 3:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>How-To-Repeat:

>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:
 >> Have you looked into the machine's kernel log to see if there is any
 >> evidence of low-level distress (hardware or filesystem level)?  I'm
 >> wondering if ENOSPC is being reported because it is the closest
 >> available errno code, but the real problem is something different than
 >> the error message text suggests.  Other than the errno the symptoms
 >> all look quite a bit like a bad-sector problem ...
 
 da1 is the storage device where the PGDATA lives.
 
 Jun 19 03:06:14 db1 kernel: mpt1: request 0xffffffff929ba560:6810
 timed out for ccb 0xffffff0000e20000 (req->ccb 0xffffff0000e20000)
 Jun 19 03:06:14 db1 kernel: mpt1: request 0xffffffff929b90c0:6811
 timed out for ccb 0xffffff0001081000 (req->ccb 0xffffff0001081000)
 Jun 19 03:06:14 db1 kernel: mpt1: request 0xffffffff929b9f88:6812
 timed out for ccb 0xffffff0000d93800 (req->ccb 0xffffff0000d93800)
 Jun 19 03:06:14 db1 kernel: mpt1: attempting to abort req
 0xffffffff929ba560:6810 function 0
 Jun 19 03:06:14 db1 kernel: mpt1: request 0xffffffff929bcc90:6813
 timed out for ccb 0xffffff03e132dc00 (req->ccb 0xffffff03e132dc00)
 Jun 19 03:06:14 db1 kernel: mpt1: completing timedout/aborted req
 0xffffffff929ba560:6810
 Jun 19 03:06:14 db1 kernel: mpt1: abort of req 0xffffffff929ba560:0 completed
 Jun 19 03:06:14 db1 kernel: mpt1: attempting to abort req
 0xffffffff929b90c0:6811 function 0
 Jun 19 03:06:14 db1 kernel: mpt1: completing timedout/aborted req
 0xffffffff929b90c0:6811
 Jun 19 03:06:14 db1 kernel: mpt1: abort of req 0xffffffff929b90c0:0 completed
 Jun 19 03:06:14 db1 kernel: mpt1: attempting to abort req
 0xffffffff929b9f88:6812 function 0
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): WRITE(16). CDB: 8a 0 0 0
 0 1 6c 99 9 c0 0 0 0 20 0 0
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): CAM Status: SCSI Status Error
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): SCSI Status: Check Condition
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): UNIT ATTENTION asc:29,0
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): Power on, reset, or bus
 device reset occurred
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): Retrying Command (per Sense Data)
 Jun 19 03:06:14 db1 kernel: mpt1: completing timedout/aborted req
 0xffffffff929b9f88:6812
 Jun 19 03:06:14 db1 kernel: mpt1: abort of req 0xffffffff929b9f88:0 completed
 Jun 19 03:06:14 db1 kernel: mpt1: attempting to abort req
 0xffffffff929bcc90:6813 function 0
 Jun 19 03:06:14 db1 kernel: mpt1: completing timedout/aborted req
 0xffffffff929bcc90:6813
 Jun 19 03:06:14 db1 kernel: mpt1: abort of req 0xffffffff929bcc90:0 completed
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): WRITE(16). CDB: 8a 0 0 0
 0 1 65 1b 71 a0 0 0 0 20 0 0
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): CAM Status: SCSI Status Error
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): SCSI Status: Check Condition
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): UNIT ATTENTION asc:29,0
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): Power on, reset, or bus
 device reset occurred
 Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): Retrying Command (per Sense Data)
 
 Tom Lane writes:
 
 Also, I suggest filing a bug with your kernel distributor --- ENOSPC was
 a totally misleading error code here.  Seems like EIO would be more
 appropriate.  They'll probably want to see the kernel log.
 
                        regards, tom lane
 
