From nobody@FreeBSD.ORG  Mon Jan 10 07:57:17 2000
Return-Path: <nobody@FreeBSD.ORG>
Received: by hub.freebsd.org (Postfix, from userid 32767)
	id 0610614F77; Mon, 10 Jan 2000 07:57:16 -0800 (PST)
Message-Id: <20000110155716.0610614F77@hub.freebsd.org>
Date: Mon, 10 Jan 2000 07:57:16 -0800 (PST)
From: sue@sleepycat.com
Sender: nobody@FreeBSD.ORG
To: freebsd-gnats-submit@FreeBSD.org
Subject: Change 1.10 in libc/xdr/xdr_rec.c breaks some RPC
X-Send-Pr-Version: www-1.0

>Number:         16028
>Category:       misc
>Synopsis:       Change 1.10 in libc/xdr/xdr_rec.c breaks some RPC
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jan 10 08:00:01 PST 2000
>Closed-Date:    Tue Jan 18 22:15:15 PST 2000
>Last-Modified:  Tue Jan 18 22:18:57 PST 2000
>Originator:     Susan LoVerso
>Release:        3.3-RELEASE and 4.0-CURRENT
>Organization:
Sleepycat Software
>Environment:
FreeBSD sue.loverso.southborough.ma.us 3.3-RELEASE FreeBSD 3.3-RELEASE #0: Sun Nov 14 19:52:38 EST 1999     root@sue.loverso.southborough.ma.us:/usr/src/sys/compile/FREEBASE  i386
FreeBSD bantha.org 4.0-CURRENT FreeBSD 4.0-CURRENT #0: Mon Dec 13 15:32:51 EST 
1999     krinsky@bantha.org:/usr/src/sys/compile/BANTHA-SOUND  i386
>Description:
The change 1.10 to xdr/xdr_rec.c causes some RPC programs to fail when
they read incorrect data.  It appears that transferring a size of
19947 bytes is important.  If, in my server program I truncat my calls
to that size, the failure occurs every time.  Likely this is due to the
fact that at that particular size, the LAST_FRAG bytes are sent as their
own separate write call.

Using ktrace, it shows that the server is writing the last 4 bytes,
which is the LAST_FRAG, but the client is "moving on" before reading it.
Then, when it attempts to read after issuing the next request, it reads
the (previous but unread) LAST_FRAG and dies (by returning FALSE from
set_input_fragment()), returning 0 to the caller.
>How-To-Repeat:
A demonstration program can be retrieved from:
ftp.sleepycat.com://pub/rpcbug.tar.gz

After unpacking it (objs and executables are in there, rebuild if necessary)
see the problem by:
% cd file
% ./file_svc &
% ./rls localhost test/*

You will see output like:
Sent test/file1 Got 19947
Bad file test/file2

When it works (rebuild libc with 1.10's check removed) you'd see:
Sent test/file1 Got 19947
Sent test/file2 Got 19947
Sent test/file3 Got 19947
Sent test/file4 Got 19947

In the tar there is also the output of a ktrace/kdump on both
the client and server processes, in cl.kd and svc.kd, respectively.

In svc.kd, at line 587, you can see the server writing the LAST_FRAG
separately from the data it just sent.  Then on the next line it reads the
next RPC request from the client, to read test/file2.

In cl.kd, at line 485, you can see the client reading the last of the data 
from the server.  Then on 487, it writes the output to stdout from
the program itself.  On 490 it writes the next RPC request.
Finally, on 494, it reads the LAST_FRAG that the server wrote for the
previous RPC.







>Fix:
The only fix I know of (workaround really) is to remove the test added in rev 1.10
from xdr_rec.c - lines 561 and 562 (553-562 if you count the comment added
at the time also).

I discovered this problem because I was getting "random" failures on a test
RPC program.  These failures never occurred on a BSDI system.  I had
already been debugging in the XDR routines and comparing them against
the BSDI version showed this single, key difference.

It is possible that the check/change added in 1.10 is invalid and incorrect.
However, it is also possible (perhaps likely) that the check is exposing
a different, latent XDR bug.  It appears that with or without the change,
XDR is "completing" before reading the LAST_FRAG and in the original code,
it likely just skips it the next pass through (when it would read test/file2
from the server).  Perhaps this change just indicates that there is a bug
in the coordination of rstrm->fbtbc and rstrm->last_frag that has been there
forever, but masked.  I am not sure which may be the case.

>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->closed 
State-Changed-By: wpaul 
State-Changed-When: Tue Jan 18 22:15:15 PST 2000 
State-Changed-Why:  
Apparently it's legal for client RPC programs to receive zero length 
records with the LAST_FRAG marger bit set. So the test that works on 
the server side breaks on the client side. I changed the test to look 
for a header value of 0, since that is actually not legal. This fixes 
the client side while still maintaining the test for the server side. 
With this fix, the sample application works correctly. 

I updated xdr_rec.c in both the -current and -stable branches. The fix 
will be in 4.0-RELEASE when it comes out. 

-Bill 
>Unformatted:
