Re: Possible NFS client 2.2.9 kernel bug

Trond Myklebust (trond.myklebust@fys.uio.no)
28 May 1999 02:14:46 +0200


Tom Shield <shield@aem.umn.edu> writes:

> Trond,
>
> I took your advice and did some more debugging, the summary is that the
> problem goes away with rsize and wsize set to 1024, but either 4096 or
> 8192 gives the infinite loop on the troublesome directory. Mounting the
> identical export to a 2.0.36/lib5 machine with any r/wsize works fine.
> Please take a look at the output below and tell me where to look next
> (assuming you think this is worth tracking down). Seems to only happen on
> a directory with particular entries, thus it is hard to generate on
> demand, other setups here at the U have also seen the same problem. But I
> have a repeatable setup on machines I can play with at home.
>
> and this gives:
>
> May 25 19:56:34 y2 kernel: NFS: 2 p[0] = 16777216 p[1] = 473432178
> May 25 19:56:34 y2 kernel: NFS: 2 p[0] = 16777216 p[1] = 540541042
> May 25 19:56:34 y2 kernel: NFS: 2 p[0] = 16777216 p[1] = 1060634738
> May 25 19:56:34 y2 kernel: NFS: 2 p[0] = 16777216 p[1] = 1178075250
> May 25 19:56:34 y2 kernel: NFS: 2 p[0] = 1677716 p[1] = 4046913650
> May 25 19:56:34 y2 kernel: NFS: 2 p[0] = 0 p[1] = 0
> May 25 19:56:34 y2 kernel: TWS: EOF bit NOT set
>
>
> run the same thing with 1024 rsize and wsize:
>
> directory was pulled down in multiple chunks, each chunk ends with:
>
> May 25 20:01:55 y2 kernel: NFS: 2 p[0] = 16777216 p[1] = 540541042
> May 25 20:01:55 y2 kernel: NFS: 2 p[0] = 16777216 p[1] = 1060634738
> May 25 20:01:55 y2 kernel: NFS: 2 p[0] = 0 p[1] = 0
> May 25 20:01:55 y2 kernel: TWS: EOF bit NOT set
>
> except for the last chunk:
>
> May 25 20:01:55 y2 kernel: NFS: 2 p[0] = 16777216 p[1] = 4046913650
> May 25 20:01:55 y2 kernel: NFS: 2 p[0] = 0 p[1] = 16777216
> May 25 20:01:55 y2 kernel: NFS: got EOF bit
> May 25 20:01:55 y2 kernel: TWS: EOF bit set
>
>
> Any thoughts or suggestions on which way to go next?
>

It looks as though readdir is overrunning the buffer limit. Hmm... I
think the problem is that we don't allow for the EOF byte when
specifying the directory buffer size to the server. RFC1094 seems to
say that the bytecount sent to the server covers only the directory
entries and not the EOF (in NFSv3 on the other hand it does. Go
figure...).

Please try to adjust the value of the buffer size that is sent to the
server using the following patch or a variant thereof.

Cheers,
Trond

--- fs/nfs/nfs2xdr.c-orig Sat Mar 6 23:21:13 1999
+++ fs/nfs/nfs2xdr.c Fri May 28 02:10:20 1999
@@ -371,7 +371,7 @@

p = xdr_encode_fhandle(p, args->fh);
*p++ = htonl(args->cookie);
- *p++ = htonl(bufsiz); /* see above */
+ *p++ = htonl(bufsiz-1); /* see above */
req->rq_slen = xdr_adjust_iovec(req->rq_svec, p);

/* set up reply iovec */

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/