Re: Regression in 5.1.20: Reading long directory fails

From: Wolfgang Walter
Date: Tue Sep 03 2019 - 14:09:48 EST


Am Dienstag, 3. September 2019, 10:49:48 schrieb Jason L Tibbitts III:
> >>>>> "JLT" == Jason L Tibbitts <tibbs@xxxxxxxxxxx> writes:
> JLT> Certainly a server reboot, or maybe even just
> JLT> unmounting and remounting the filesystem or copying the data to
> JLT> another filesystem would tell me that. In any case, as soon as I
> JLT> am able to mess with that server, I'll know more.
>
> Rebooting the server did not make any difference, and now more users are
> seeing the problem. At this point I'm in a state where NFS simply isn't
> reliable at all, and I'm not sure what to do. If Centos 8 were out,
> I'd work on moving to that just so that the server was a little more
> modern. (Currently the server is Centos 7.) I guess I could try using
> Fedora, or installing one of the upstream kernels, just in case this has
> to do with some interaction between the client and the old RHEL7 kernel.
>
> I do have a packet capture of a directory listing that fails with EIO,
> but I'm not sure if it's safe to simply post it, and I'm not sure what
> tshark options would be useful in decoding it.
>
> I do know that I can rsync one of the problematic directories to a
> different server (running the same kernel) and it doesn't have the same
> problem. What I'll try next is rsyncing to a different filesystem on
> the same server, but again I'll have to wait until people log off to do
> proper testing.
>
> - J<

What filesystem do you use on the server? xfs? If yes, does it use 64bit
inodes (or started to use them)? Do you set a fsid when you export the
filesystem?

Regards,
--
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts