Re: Regression in 5.1.20: Reading long directory fails

From: Jason L Tibbitts III
Date: Tue Sep 03 2019 - 11:50:08 EST


>>>>> "JLT" == Jason L Tibbitts <tibbs@xxxxxxxxxxx> writes:

JLT> Certainly a server reboot, or maybe even just
JLT> unmounting and remounting the filesystem or copying the data to
JLT> another filesystem would tell me that. In any case, as soon as I
JLT> am able to mess with that server, I'll know more.

Rebooting the server did not make any difference, and now more users are
seeing the problem. At this point I'm in a state where NFS simply isn't
reliable at all, and I'm not sure what to do. If Centos 8 were out,
I'd work on moving to that just so that the server was a little more
modern. (Currently the server is Centos 7.) I guess I could try using
Fedora, or installing one of the upstream kernels, just in case this has
to do with some interaction between the client and the old RHEL7 kernel.

I do have a packet capture of a directory listing that fails with EIO,
but I'm not sure if it's safe to simply post it, and I'm not sure what
tshark options would be useful in decoding it.

I do know that I can rsync one of the problematic directories to a
different server (running the same kernel) and it doesn't have the same
problem. What I'll try next is rsyncing to a different filesystem on
the same server, but again I'll have to wait until people log off to do
proper testing.

- J<