From nobody@FreeBSD.org  Sun Feb  8 03:21:59 2004
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5628616A4CE
	for <freebsd-gnats-submit@FreeBSD.org>; Sun,  8 Feb 2004 03:21:59 -0800 (PST)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 52F4B43D1F
	for <freebsd-gnats-submit@FreeBSD.org>; Sun,  8 Feb 2004 03:21:59 -0800 (PST)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.12.10/8.12.10) with ESMTP id i18BLw72037904
	for <freebsd-gnats-submit@FreeBSD.org>; Sun, 8 Feb 2004 03:21:58 -0800 (PST)
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.12.10/8.12.10/Submit) id i18BLwEl037903;
	Sun, 8 Feb 2004 03:21:58 -0800 (PST)
	(envelope-from nobody)
Message-Id: <200402081121.i18BLwEl037903@www.freebsd.org>
Date: Sun, 8 Feb 2004 03:21:58 -0800 (PST)
From: "Jukka A. Ukkonen" <jau@iki.fi>
To: freebsd-gnats-submit@FreeBSD.org
Subject: SIGALRM is not delivered when res_send() hangs waiting in kevent()
X-Send-Pr-Version: www-2.0

>Number:         62524
>Category:       kern
>Synopsis:       SIGALRM is not delivered when res_send() hangs waiting in kevent()
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Feb 08 03:30:20 PST 2004
>Closed-Date:    Sun Feb 08 08:14:50 PST 2004
>Last-Modified:  Sun Feb 08 08:14:50 PST 2004
>Originator:     Jukka A. Ukkonen
>Release:        4.9-STABLE
>Organization:
>Environment:
FreeBSD mjolnir 4.9-STABLE FreeBSD 4.9-STABLE #0: Sun Jan 25 10:05:58 EET 2004     jau@mjolnir.XXXXXXXXXXX:/usr/src/sys/compile/Mjolnir  i386
>Description:
When an interval timer should trigger a SIGALRM delivery to
a process which tries to resolve an address to an FQDN and
there is no name service available for the corresponding
in-addr.arpa. domain the signal is not delivered.

Instead the process simply hangs waiting in ...

#0  0x280b3c00 in kevent () from /usr/lib/libc.so.4
#1  0x280c63f5 in res_send () from /usr/lib/libc.so.4
#2  0x280c9799 in res_query () from /usr/lib/libc.so.4
#3  0x280d5ccf in _gethostbydnsaddr () from /usr/lib/libc.so.4
#4  0x280d45f0 in gethostbyaddr () from /usr/lib/libc.so.4

The process continues only when the resolver timeout expires.

This is a disaster for any program which uses setitimer() and
SIGALRM to drive periodic tasks at regular intervals.

It looks like kevent() is not properly interrupted by the signal.

>How-To-Repeat:
      Call setitimer() to set the timer to trigger SIGARLM at say
5 second intervals. Actually anything relatively short should be OK
as long as it is shorter than your default resolver timeout.

Now call gethostbyaddr() with an address for which the reverse DNS
server is not available.
Before testing make sure the registered reverse DNS server is
unavailable...

# dig -x 63.85.29.224
; <<>> DiG 9.2.1 <<>> -x 63.85.29.224
;; global options:  printcmd
;; connection timed out; no servers could be reached

Now launch your SIGALRM delivery test.
You will end up waiting in kevent() until the resolver timeout
expires instead of receiving the expected SIGALRM.

>Fix:
      None known yet.
>Release-Note:
>Audit-Trail:

From: Bruce Evans <bde@zeta.org.au>
To: "Jukka A. Ukkonen" <jau@iki.fi>
Cc: freebsd-gnats-submit@freebsd.org, freebsd-bugs@freebsd.org
Subject: Re: kern/62524: SIGALRM is not delivered when res_send() hangs
 waiting in kevent()
Date: Mon, 9 Feb 2004 02:39:02 +1100 (EST)

 On Sun, 8 Feb 2004, Jukka A. Ukkonen wrote:
 
 > >Description:
 > When an interval timer should trigger a SIGALRM delivery to
 > a process which tries to resolve an address to an FQDN and
 > there is no name service available for the corresponding
 > in-addr.arpa. domain the signal is not delivered.
 >
 > Instead the process simply hangs waiting in ...
 >
 > #0  0x280b3c00 in kevent () from /usr/lib/libc.so.4
 > #1  0x280c63f5 in res_send () from /usr/lib/libc.so.4
 > #2  0x280c9799 in res_query () from /usr/lib/libc.so.4
 > #3  0x280d5ccf in _gethostbydnsaddr () from /usr/lib/libc.so.4
 > #4  0x280d45f0 in gethostbyaddr () from /usr/lib/libc.so.4
 >
 > The process continues only when the resolver timeout expires.
 >
 > This is a disaster for any program which uses setitimer() and
 > SIGALRM to drive periodic tasks at regular intervals.
 >
 > It looks like kevent() is not properly interrupted by the signal.
 
 Are you sure that it doesn't get delivered?  There is an old bug
 in the resolver library that causes it to wait forever.  The signal
 gets delivered and (at least using select() instead of kevent())
 the resolver sees EINTR, but the resolver retries forever in some
 cases.  This breakes signal handling in ping(8).  The signal is
 delivered to ping but ping just sets a flag and waits for its main
 loop to be returned to to check the flag, but the resolver never
 returns.
 
 BTW, correct signal handlers that just set a flag often don't actually
 work.  E.g., top's signal handling was broken years ago by changing
 its hanflers to just set a flag.  The SIGINT handler is attached using
 signal(), so most syscalls don't return after a signal and top can't
 be killed by ^C when it is in such a syscall -- e.g., start top and
 type "s"; top then waits for input and if you type ^C then it keeps
 waiting for input and the ^C is not acted on until you type some input
 (or kill the program using an uncaught signal).  Signal handlers must
 be installed without SA_RESTART for the just-set-a-flag method to work,
 and then everything that does i/o, including stdio calls, needs to be
 SYSVified to expect EINTR.  Few programs or libraries get this right.
 
 Bruce
State-Changed-From-To: open->closed 
State-Changed-By: iedowse 
State-Changed-When: Sun Feb 8 08:07:35 PST 2004 
State-Changed-Why:  

Submitter confirms that it was a problem with the application, not 
kevent or the resolver: 

>Unformatted:
 >>Are you sure that there isn't something more needed for the problem 
 >>to occur? I can't reproduce this as you describe with the program 
 >>below; the SIGARLM handler keeps getting called as expected. 
 >> 
 >>Ian 
 >        You got it quite right. 
 >        I saw two separate calls to gethostbyaddr() one of which was 
 >        not properly inside the region of code which is interruptible 
 >        by SIGALRM. 
 >        Obviously enough the one outside of the interruptible region 
 >        had no signal delivered because the signal was blocked from 
 >        delivery. 
 > 
 > 
 >        Cheers, 
 >                // jau 
 
 ... 
 
 >Quoting Ian Dowse: 
 >>  
 >> Does that mean the PR can be closed? I'm not sure if it's related, 
 >> but there is another PR, bin/4696, which reports this problem in 
 >> ping(8). Ping will hang in a state where it ignores ^C and sends 
 >> no packets until a gethostbyaddr() call completes. I think it only 
 >> happens when ping gets an error reply and tries to look up the IP 
 >> that sent the error. 
 > 
 >        Right, my complaint should have never been written. The code 
 >        just happened to be big and complex enough to hide the fact 
 >        there were calls to gethostbyaddr() also inside a region that 
 >        was protected from signals. All name resolution should have 
 >        been considered "nice to have" and done only when there was 
 >        nothing more important to do. 
 ... 
 
 
 http://www.freebsd.org/cgi/query-pr.cgi?pr=62524 
