Re: Regression in 5.1.20: Reading long directory fails

From: Jason L Tibbitts III
Date: Wed Aug 28 2019 - 14:29:07 EST


>>>>> "BF" == J Bruce Fields <bfields@xxxxxxxxxxxx> writes:

BF> Looks like that's db531db951f950b8 upstream. (Do you know if it's
BF> reproduceable upstream as well?)

Yes, it's reproducible up in the 5.3.0 RCs as well.

However, while trying to do some further bisecting I ran into an odd
problem. Now kernels which were previously working (i.e. 5.1.19 and
older) are returning errors, but at a different file count. This only
gives me more questions. And so, just to be absolutely sure that there
isn't some weird server issue involved, I'm going to try to schedule a
reboot of the relevant server.

BF> Maybe it depends on having names of the right length to place some
BF> bit of xdr on a boundary. I wonder if it'd be possible to reproduce
BF> just by varying the name lengths randomly till you hit it.

I know I can't reproduce with loads of short names, and with relatively
long names as well (using sha256sum as filename generator).

BF> No clever debugging ideas off the top of my head, I'm afraid. I
BF> might start by patching the kernel or doing some tracing to figure
BF> out exactly where that EIO is being generated?

If I had any idea how to do that, I happily would. I'm certainly
willing to learn. At least I can run strace to see where ls bombs:

getdents64(5, 0x7fc13afaf040, 262144) = -1 EIO (Input/output error)

bcodding on IRC mentioned that is a rather large count. Does make me
wonder if the server is weirding out and sending the client bogus data.
Certainly a server reboot, or maybe even just unmounting and remounting
the filesystem or copying the data to another filesystem would tell me
that. In any case, as soon as I am able to mess with that server, I'll
know more.

_ J<