Re: Regression in 5.1.20: Reading long directory fails
From: Jason L Tibbitts III
Date: Wed Aug 28 2019 - 14:29:07 EST
>>>>> "BF" == J Bruce Fields <bfields@xxxxxxxxxxxx> writes:
BF> Looks like that's db531db951f950b8 upstream. (Do you know if it's
BF> reproduceable upstream as well?)
Yes, it's reproducible up in the 5.3.0 RCs as well.
However, while trying to do some further bisecting I ran into an odd
problem. Now kernels which were previously working (i.e. 5.1.19 and
older) are returning errors, but at a different file count. This only
gives me more questions. And so, just to be absolutely sure that there
isn't some weird server issue involved, I'm going to try to schedule a
reboot of the relevant server.
BF> Maybe it depends on having names of the right length to place some
BF> bit of xdr on a boundary. I wonder if it'd be possible to reproduce
BF> just by varying the name lengths randomly till you hit it.
I know I can't reproduce with loads of short names, and with relatively
long names as well (using sha256sum as filename generator).
BF> No clever debugging ideas off the top of my head, I'm afraid. I
BF> might start by patching the kernel or doing some tracing to figure
BF> out exactly where that EIO is being generated?
If I had any idea how to do that, I happily would. I'm certainly
willing to learn. At least I can run strace to see where ls bombs:
getdents64(5, 0x7fc13afaf040, 262144) = -1 EIO (Input/output error)
bcodding on IRC mentioned that is a rather large count. Does make me
wonder if the server is weirding out and sending the client bogus data.
Certainly a server reboot, or maybe even just unmounting and remounting
the filesystem or copying the data to another filesystem would tell me
that. In any case, as soon as I am able to mess with that server, I'll
know more.
_ J<