On Thu, 2006-07-06 at 18:15 +0300, Razvan Gavril wrote:For now i only tested like this :I have a nfs server(kernel-server) which i use as a boot server for several other machines on the network. Starting with 2.6.16 i started noticing that when having more than one of the clients doing a lot of in/out on their mounted nfs shares at list one of then starts to to have problems when writing (don't know about reading) files. For example dpkg writes strange things it the /var/lib/dpkg/status file even if it worked perfectly before the kernel upgrade.
Every time an diskless computer fails to write corectly to the nfs filesystem i got this messages on the nfs server (dmesg):
RPC: bad TCP reclen 0x3c390000 (large)
RPC: bad TCP reclen 0x31006261 (non-terminal)
RPC: bad TCP reclen 0x73752070 (non-terminal)
RPC: bad TCP reclen 0x52610100 (non-terminal)
Is very simple to spot this behaver (1 write-error for client / 1 rpc message in server's dmesg) because apt-get is always giving an error message when the /var/lib/dpkg/status file contains something that it shouldn't. An it also can be very ease to reproduce.
I tested with 2.6.17 and got the same error, although when using 2.6.15 didn't got any errors and the clients worked perfect. Since i'm kind of forced to use a kernel version > 2.6.15 i really, really need to solve this bug. I would be glad to do it myself but i don't have the knowledge to do it so if is anybody that can help i can offer all the information that i could and also access to a system so he can track the problem.
--
Razvan Gavril
Did the problem start when you upgraded the clients or the server?
Cheers,
Trond