From nobody@FreeBSD.org  Mon Jul 25 10:06:13 2011
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A69CF106566C
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 25 Jul 2011 10:06:13 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id 7CE228FC08
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 25 Jul 2011 10:06:13 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id p6PA6Cn8019246
	for <freebsd-gnats-submit@FreeBSD.org>; Mon, 25 Jul 2011 10:06:12 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id p6PA6CbU019245;
	Mon, 25 Jul 2011 10:06:12 GMT
	(envelope-from nobody)
Message-Id: <201107251006.p6PA6CbU019245@red.freebsd.org>
Date: Mon, 25 Jul 2011 10:06:12 GMT
From: Michael Gmelin <freebsd@grem.de>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [libc] close(2) emitting ECONNRESET is not POSIX compliant
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         159179
>Category:       kern
>Synopsis:       [libc] close(2) emitting ECONNRESET is not POSIX compliant
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jul 25 10:10:07 UTC 2011
>Closed-Date:    
>Last-Modified:  Sun Apr 13 21:20:00 UTC 2014
>Originator:     Michael Gmelin
>Release:        FreeBSD 8.2-RELEASE-p1
>Organization:
Grem Equity GmbH
>Environment:
System: FreeBSD srv06 8.2-RELEASE-p1 FreeBSD 8.2-RELEASE-p1 #0 r221593: Sat May  7 15:12:25 CEST 2011
>Description:
With the advent of FreeBSD 6.3 the close(2) call was changed to return
errno ECONNRESET under certain circumstances. The man page was changed
accordingly, but in my understanding errno = ECONNRESET is not covered
by POSIX.1-2008 (see
http://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html).
Also all other implementations of close I've seen in the past do not
behave like this, which leads to actual problems in reality.

In practice this means that all projects ported to FreeBSD would need
to get reviewed if they can handle these situations gracefully, which
usually doesn't happen. Examples I'm aware of are:

Ruby:
http://redmine.ruby-lang.org/issues/3515

Ice:
http://www.zeroc.com/forums/patches/5435-patch-network-cpp-freebsd-econnreset-close-2-problem.html

The problematic change was done quite a while ago:

r164516 | sam | 2006-11-22 17:16:54 +0000 (Wed, 22 Nov 2006) | 19 lines

----
Change error codes returned by protocol operations when an inpcb is
marked INP_DROPPED or INP_TIMEWAIT:
o return ECONNRESET instead of EINVAL for close, disconnect, shutdown,
  rcvd, rcvoob, and send operations
o return ECONNABORTED instead of EINVAL for accept

These changes should reduce confusion in applications since EINVAL is
normally interpreted to mean an invalid file descriptor.  This change
does not conflict with POSIX or other standards I checked. The return
of EINVAL has always been possible but rare; it's become more common
with recent changes to the socket/inpcb handling and with finer-grained
locking and preemption.

Note: there are other instances of EINVAL for this state that were
      left unchanged; they should be reviewed.

Reviewed by:    rwatson, andre, ru
MFC after:      1 month

---

There are other open PRs out there (e.g.
http://www.freebsd.org/cgi/query-pr.cgi?pr=146845) but these don't
focus on the POSIX impact of this behavior. Also note that other calls
might be affected by this as well (as suggested by the commit message).


>How-To-Repeat:

>Fix:
Make sure, that the close call conforms to POSIX.1-2008 (by returning
EINVAL instead of ECONNRESET again).

Please note that this probably won't fix the underlying problem - we
started seeing these ECONNRESET issues on machines with eight and more
cores quite frequently (using ice). So just replacing ECONNRESET with
EINVAL, but not fixing why this is happening will probably lead to more
confusion and break the workarounds that are out there right now.

>Release-Note:
>Audit-Trail:

From: Michael Gmelin <freebsd@grem.de>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/159179: [libc] close(2) emitting ECONNRESET is not POSIX compliant
Date: Mon, 25 Jul 2011 20:29:00 +0200

 Thinking about this, even the previous behavior (returning EINVAL) was
 not POSIX.1 compliant (at least as far as I understand the standard).
 The author of the patch clearly states that he thinks it is compliant,
 so it would be interesting to see what his perception is based on. It
 would also be good to get a better understanding of why this error is
 emitted in the first place (I got a rough understanding of how the pcb's
 come into play here) and why this seems to happen more frequently now
 (finer grained locking, multithreading etc.). FInally it would be
 interesting to know if this is connected to the rewrites that have taken
 place between 7 and 8. Ultimately I think whatever is going on behind
 the scenes, the high level API calls should be POSIX compliant -
 alternatively the documentation/man pages should clearly state, where
 they're not.
 

From: Jilles Tjoelker <jilles@stack.nl>
To: bug-followup@FreeBSD.org, freebsd@grem.de
Cc:  
Subject: Re: kern/159179: [libc] close(2) emitting ECONNRESET is not POSIX
 compliant
Date: Sun, 13 Apr 2014 23:14:27 +0200

 In FreeBSD PR kern/159179, you wrote:
 > With the advent of FreeBSD 6.3 the close(2) call was changed to return
 > errno ECONNRESET under certain circumstances. The man page was changed
 > accordingly, but in my understanding errno = ECONNRESET is not covered
 > by POSIX.1-2008 (see
 > http://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html).
 > Also all other implementations of close I've seen in the past do not
 > behave like this, which leads to actual problems in reality.
 
 > In practice this means that all projects ported to FreeBSD would need
 > to get reviewed if they can handle these situations gracefully, which
 > usually doesn't happen.
 
 POSIX permits additional errors. XSH 2.3 Error Numbers says:
 ] Implementations may generate error numbers listed here under
 ] circumstances other than those described, if and only if all those
 ] error conditions can always be treated identically to the error
 ] conditions as described in this volume of POSIX.1-2008.
 ] Implementations shall not generate a different error number from one
 ] required by this volume of POSIX.1-2008 for an error condition
 ] described in this volume of POSIX.1-2008, but may generate additional
 ] errors unless explicitly disallowed for a particular function.
 
 The page for close() does not exclude [ECONNRESET] or any other error.
 
 One problem with close() errors is that there may be confusion about
 whether the file descriptor is still valid. In FreeBSD (and also Linux),
 close() on a valid file descriptor always deallocates it, even if there
 is an error while closing.
 
 The problem reported in kern/146845 may cause [ECONNRESET] errors even
 when no data was lost. This may have been fixed.
 
 -- 
 Jilles Tjoelker
>Unformatted:
