Possible NFS client 2.2.9 kernel bug and fix

Tom Shield (shield@aem.umn.edu)
Sat, 22 May 1999 13:27:58 -0500 (CDT)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Andrea Arcangeli: "Re: [patch] TCP/IP delacks disabled with MPI"
Previous message: Stefan Knabe: "Linux-crash by infinite recursion"

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.
Send mail to mime@docserver.cac.washington.edu for more info.

---2041310407-118358326-927397678=:20939
Content-Type: TEXT/PLAIN; charset=US-ASCII

Hello,

Here is the bug I ran across:

NFS mount from server (nfs-server-2.2beta37, kernel 2.2.6, glibc) to a
2.2.9 kernel glibc (RH5.2+) machine. Initially found that on a find of
all files on a nfs mount, find used up all swap and caused nasty things to
happen. Tracked this down to a single directory that had an infinite
loop in its entries (readdir() keeps returning the directory entries
over and over again). I checked the following:

directory is fine on server (actually a read only mount of an NTFS)
directory mounted on a 2.0.36 kernel (RH 4.2+, libc) box is ok

Thus I figured it had to be the 2.2.9 (2.2.7 and 2.2.8 also) kernel.

So I hacked my way into fs/nfs/dir.c and traced it to the EOF (bit 15) bit
not being set on the last entry returned by the nfs_proc_readdir. I don't
know sunrpc (not that I know NFS either ;), thus it got too dense for me
to dig deeper. So I fixed it at that point. The attached patch addeds a
check for the EOF bit on the last directory entry and sets it if it is not
set on the initial read of the directory before it is put in the cache.
When it was not set the cache reading code never finds the directory entry
in the cache (it falls off the bottom of the j loop thru the entires
looking for the EOF bit set), so the dir is fetched again and again it is
not found in the cache, giving the infinite loop.

Someone might want to think about the consequences of hiting the break at
line 207 in dir.c, which AFAIK should never be hit. There does
seem to be an inconsistancy in the code that two different methods are used
to signal the end of the directory, initially size and then the EOF bit
when reading from the cache. When find used up all swap on this loop lots
of bad things happend, processes died, etc.

This is a hard error, so I can test any better fix that comes along.
However, my patch might be an addition to the NFS_PARANOIA as the
consequences are rather severe. But I don't think I've found the original
cause of this behavior.

thanks for your help,

Tom Shield
Aerospace Engineering and Mechanics
University of Minnesota
(612) 626-7793
http://www.aem.umn.edu/people/faculty/shield/

---2041310407-118358326-927397678=:20939
Content-Type: TEXT/PLAIN; charset=US-ASCII; name="nfs_dir_loop-2.2.9.patch"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.SGI.3.91.990522132758.20939B@twain.aem.umn.edu>
Content-Description:

ZGlmZiAtQyAyIC1QIC91c3IvdG1wL25mcy9kaXIuYyBmcy9uZnMvZGlyLmMN
CioqKiAvdXNyL3RtcC9uZnMvZGlyLmMJV2VkIE1heSAxMiAxNzoxMDozNCAx
OTk5DQotLS0gZnMvbmZzL2Rpci5jCVRodSBNYXkgMjAgMTk6Mzg6MDUgMTk5
OQ0KKioqKioqKioqKioqKioqDQoqKiogMjQ2LDI0OSAqKioqDQotLS0gMjQ2
LDI2NyAtLS0tDQogIAkJY2FjaGUtPnZhbGlkID0gMTsNCiAgCQllbnRyeSA9
IGNhY2hlLT5lbnRyeSArIChpbmRleCA9IDApOw0KKyAJCQ0KKyAJCS8qIG1h
a2Ugc3VyZSB0aGUgbGFzdCBvbmUgaGFzIEVPRiBzZXQgDQorIAkJICAgdGhp
cyBhdm9pZHMgdGhlIHBvc3NpYmlsaXR5IG9mIGEgbmFzdHkgaW5mIGxvb3Ag
LS0gc2hpZWxkQGFlbS51bW4uZWR1ICovDQorIAkJDQorIAkJew0KKyAJCQlf
X3UzMiAqdGhpc19lbnQgPSBjYWNoZS0+ZW50cnkgKyAzKmNhY2hlLT5zaXpl
IC0gMzsNCisgCQkNCisgCQkJaWYgKCoodGhpc19lbnQrMikgJiAoMSA8PCAx
NSkpDQorIAkJCXsNCisgCQkJCWRmcHJpbnRrKFZGUywgIlRXUzogZm91bmQg
RU9GIFxuIik7DQorIAkJCX0NCisgCQkJZWxzZQ0KKyAJCQl7DQorIAkJCQkq
KHRoaXNfZW50KzIpID0gKih0aGlzX2VudCsyKSB8ICgxIDw8IDE1KTsNCisg
CQkJCWRmcHJpbnRrKFZGUywgIlRXUzogU0VUIEVPRiBiaXQgIFxuIik7DQor
IAkJCX0NCisgCQl9DQorIAkJDQogIAl9DQogIAljYWNoZS0+bXRpbWUgPSBp
bm9kZS0+aV9tdGltZTsNCg==
---2041310407-118358326-927397678=:20939--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andrea Arcangeli: "Re: [patch] TCP/IP delacks disabled with MPI"
Previous message: Stefan Knabe: "Linux-crash by infinite recursion"