Serious NFS problems, server hangs, 2.0.36

Steve Benson (steve@orion.nye.net)
Fri, 15 Jan 1999 14:15:00 -0500 (EST)


Hello. I seem to be experiencing some problems related to NFS which cause
NFS clients to become unusable (see details below). I've read about this
problem happening to 2.0.33 and earlier in the list archives, however my NFS
server is running 2.0.34 and the client is running 2.0.36. The nfs server is
running nfsd 2.2 beta 37.

Basically this is what happens. Every so often (at least once a week), with
no apparent trigger, one of our servers will start printing messages like
this to the syslog and console:

Jan 15 04:00:15 server kernel: RPC: rpc_send sending evil packet:
Jan 15 04:00:15 server kernel: e703045b 01000000 00000000 00000000
00000000 00000000 00000000 7a000801
Jan 15 04:00:20 server kernel: RPC: rpc_doio sending evil packet:
Jan 15 04:00:20 server kernel: e703045b 01000000 00000000 00000000
00000000 00000000 00000000 7a000801

Once this starts happening, the NFS mounted filesystems to the server are
unusable. Shortly after these messages start appearing, resource
inavailability errors start occuring as well, and users are unable to log in
and running processes start dying. Errors such as "fork() failed: resource
temporarily unavailable" and "Unable to load interpreter" start appearing.
No one can log in, and basically new processes cannot be created. I've heard
in the old posts I've found that other users who have had these problems
have had these resource allocation errors as well, so apparently they're
related (to the nfs problem).

Also, I get reports that the machine is receiving SYN floods from various
obviosuly spoofed ip addresses somewhere during this "sending evil packet"
stuff. I'm not sure if they're related (and I hope they're not the cause),
but the SYN floods usually occur after the RPC problems have been going on
for a while.

This NFS client machine is a very busy multi-user server. It is running
RH5.2, kernel 2.0.36. The server machine is running 2.0.34, Redhat 5.1 I
believe, and universal nfs server 2.2 beta 37. NFS serving is the only thing
the machine does. The server has been up and running fine for months, no
error messages, no problems. Sometimes, if I can get into the client machine
to unmount the NFS filesystem, I can remount it and that'll fix the problems
for a while. The server also has less busy client systems attached to it
which never have these problems. As far as I can tell the server isn't the
problem, which is why I haven't bothered to upgrade the kernel (and I need
to avoid the nfs server going down if possible).

I haven't seen any solutions to this problem in the archives yet. If needs
any additional information, or has any ideas as to what's causing this or
how to fix it, input would be greatly appreciated :).

Thanks in advance

-Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/