Re: Regression in 5.1.20: Reading long directory fails

From: Trond Myklebust
Date: Sun Sep 08 2019 - 12:47:31 EST


On Sun, 2019-09-08 at 11:48 -0400, Chuck Lever wrote:
> > On Sep 8, 2019, at 11:19 AM, Trond Myklebust <
> > trondmy@xxxxxxxxxxxxxxx> wrote:
> >
> > On Sun, 2019-09-08 at 07:39 -0400, Benjamin Coddington wrote:
> > > On 6 Sep 2019, at 16:50, Chuck Lever wrote:
> > >
> > > > > On Sep 6, 2019, at 4:47 PM, Jason L Tibbitts III <
> > > > > tibbs@xxxxxxxxxxx>
> > > > > wrote:
> > > > >
> > > > > > > > > > "JBF" == J Bruce Fields <bfields@xxxxxxxxxxxx>
> > > > > > > > > > writes:
> > > > >
> > > > > JBF> Those readdir changes were client-side, right? Based on
> > > > > that
> > > > > I'd
> > > > > JBF> been assuming a client bug, but maybe it'd be worth
> > > > > getting
> > > > > a
> > > > > full
> > > > > JBF> packet capture of the readdir reply to make sure it's
> > > > > legit.
> > > > >
> > > > > I have been working with bcodding on IRC for the past couple
> > > > > of
> > > > > days
> > > > > on
> > > > > this. Fortunately I was able to come up with way to fill up
> > > > > a
> > > > > directory
> > > > > in such a way that it will fail with certainty and as a bonus
> > > > > doesn't
> > > > > include any user data so I can feel OK about sharing packet
> > > > > captures.
> > > > > I
> > > > > have a capture alongside a kernel trace of the problematic
> > > > > operation
> > > > > in
> > > > > https://www.math.uh.edu/~tibbs/nfs/. Not that I can
> > > > > particularly
> > > > > tell
> > > > > anything useful from that, but bcodding says that it seems to
> > > > > point
> > > > > to
> > > > > some issue in sunrpc.
> > > > >
> > > > > And because I can easily reproduce this and I was able to do
> > > > > a
> > > > > bisect:
> > > > >
> > > > > 2c94b8eca1a26cd46010d6e73a23da5f2e93a19d is the first bad
> > > > > commit
> > > > > commit 2c94b8eca1a26cd46010d6e73a23da5f2e93a19d
> > > > > Author: Chuck Lever <chuck.lever@xxxxxxxxxx>
> > > > > Date: Mon Feb 11 11:25:41 2019 -0500
> > > > >
> > > > > SUNRPC: Use au_rslack when computing reply buffer size
> > > > >
> > > > > au_rslack is significantly smaller than (au_cslack << 2).
> > > > > Using
> > > > > that value results in smaller receive buffers. In some
> > > > > cases
> > > > > this
> > > > > eliminates an extra segment in Reply chunks (RPC/RDMA).
> > > > >
> > > > > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> > > > > Signed-off-by: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx>
> > > > >
> > > > > :040000 040000 d4d1ce2fbe0035c5bd9df976b8c448df85dcb505
> > > > > 7011a792dfe72ff9cd70d66e45d353f3d7817e3e M net
> > > > >
> > > > > But of course, I can't say whether this is the actual bad
> > > > > commit
> > > > > or
> > > > > whether it just introduced a behavior change which alters
> > > > > the
> > > > > conditions
> > > > > under which the problem appears.
> > > >
> > > > The first place I'd start looking is the XDR constants at the
> > > > head
> > > > of
> > > > fs/nfs/nfs4xdr.c
> > > > having to do with READDIR.
> > > >
> > > > The report of behavior changes with the use of krb5p also makes
> > > > this
> > > > commit plausible.
> > >
> > > After sprinkling the printk's, we're coming up one word short in
> > > the
> > > receive
> > > buffer. I think we're not accounting for the xdr pad of buf-
> > > >pages
> > > for
> > > NFS4
> > > readdir -- but I need to check the RFCs. Anyone know if v4
> > > READDIR
> > > results
> > > have to be aligned?
> > >
> > > Also need to check just why krb5i is the only auth that cares..
> > >
> >
> > I'm not seeing that. If you look at commit 02ef04e432ba, you'll see
> > that Chuck did add a 'padding term' to decode_readdir_maxsz in the
> > NFSv4 case.
> > The other thing to remember is that a readdir 'dirlist4' entry is
> > always word aligned (irrespective of the length of the filename),
> > so
> > there is no padding that needs to be taken into account.
> >
> > I think we probably rather want to look at how auth->au_ralign is
> > being
> > calculated for the case of krb5i. I'm really not understanding why
> > auth->au_ralign should not take into account the presence of the
> > mic.
> > Chuck?
>
> I'm looking at gss_unwrap_resp_integ():
>
> 1971 auth->au_rslack = auth->au_verfsize + 2 + 1 +
> XDR_QUADLEN(mic.len);
> 1972 auth->au_ralign = auth->au_verfsize + 2;
>
> au_ralign now sets the alignment of the _start_ of the RPC message
> body.
> The MIC comes _after_ the RPC message body for krb5i.
>
> If Ben is off by one quad, that's not the MIC, which is typically 32
> octets,
> isn't it?
>
> Maybe some variable-length data item in the returned file attributes
> is missing
> an XDR pad.

The only two pieces of variable length data in the readdir payload are
the file name and the filehandle data. Those might present a problem
when encoding on the server side, but not when decoding on the client
side, since they are embedded in the dirlist4 (which, as I said, is
automatically aligned).

Hmm... One thing that does bother me in both gss_unwrap_resp_integ()
and gss_unwrap_resp_priv() is that if the seqno does not match, then we
return EIO. What if we had to retransmit a request, but the server
managed to squeeze off a reply to the first transmission?
Note: it should be pretty easy to catch issues such as this, since we
do have tracepoints for them. That said, it is pretty hard to imagine
this being the problem here if the bug is always reproducible (since
retransmissions typically are not).

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx