Re: Regression in 5.1.20: Reading long directory fails

From: Benjamin Coddington
Date: Thu Sep 12 2019 - 08:29:52 EST

On 11 Sep 2019, at 13:54, Chuck Lever wrote:

On Sep 11, 2019, at 1:50 PM, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:

On 11 Sep 2019, at 13:40, Benjamin Coddington wrote:

On 11 Sep 2019, at 13:29, Chuck Lever wrote:

On Sep 11, 2019, at 1:26 PM, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:

On 11 Sep 2019, at 12:39, Chuck Lever wrote:

On Sep 11, 2019, at 12:25 PM, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:

Instead, I think we want to make sure the mic falls squarely into the tail
every time.

I'm not clear how you could do that. The length of the page data is not
known to the client before it parses the reply. Are you suggesting that
gss_unwrap should do it somehow?

Is it too niave to always put the mic at the end of the tail?

The size of the page content is variable.

The only way the MIC will fall into the tail is if the page content is
exactly the largest expected size. When the page content is smaller than
that, the receive logic will place part or all of the MIC in ->pages.

Ok, right. But what I meant is that xdr_buf_read_netobj() should be renamed
and repurposed to be "move the mic from wherever it is to the end of
xdr_buf's tail".

But now I see what you mean, and I also see that it is already trying to do
that.. and we don't want to overlap the copy..

So, really, we need the tail to be larger than twice the mic.. less 1. That
means the fix is probably just increasing rslack for krb5i.

.. or we can keep the tighter tail space, and if we detect the mic straddles
the page and tail, we can move the mic into the tail with 2 copies, first
move the bit in the tail back, then move the bit in the pages.

Which is preferred, less allocation, or in the rare case this occurs, doing
copy twice?

It sounds like the bug is that the current code does not deal correctly
when the MIC crosses the boundary between ->pages and ->tail? I'd like
to see that addressed rather than changing rslack.

Here's what I'm about to run through my testing:

diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 48c93b9e525e..d6ffc9011269 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -1238,14 +1238,21 @@ EXPORT_SYMBOL_GPL(xdr_encode_word);

/* If the netobj starting offset bytes from the start of xdr_buf is contained
* entirely in the head or the tail, set object to point to it; otherwise
- * try to find space for it at the end of the tail, copy it there, and
- * set obj to point to it. */
+ * try to find space for it at the end of the tail, and copy it there. If
+ * the netobj is partly within the page data and tail, shrink the pages to
+ * move the object into the tail */
int xdr_buf_read_netobj(struct xdr_buf *buf, struct xdr_netobj *obj, unsigned int offset)
struct xdr_buf subbuf;
+ unsigned int page_range;

if (xdr_decode_word(buf, offset, &obj->len))
return -EFAULT;
+ page_range = buf->head->iov_len + buf->page_len - offset + 4;
+ if (page_range > 0 && page_range < obj->len)
+ xdr_shrink_pagelen(buf, page_range);
if (xdr_buf_subsegment(buf, &subbuf, offset + 4, obj->len))
return -EFAULT;

Is the use of xdr_shrink_pagelen() at this point in the decoding a problem for RDMA?