how to diagnose persistent hangs in rpc_recv?

Nelson Minar (nelson@media.mit.edu)
27 Jan 1997 20:09:05 -0500


For the past few months my Linux box has had persistent, occasional
problems of processes hanging in rpc_recv. My box has most of its
files local, but my home directory is NFS mounted (via amd). I don't
expect an easy answer, but tips on how to debug it further would be
very valuable.

I'm running kernel 2.0.26, on P90 with 64 megs of RAM and some
ethernet card that uses the 3c509 driver. My AMD is version
920824upl102-2 from RedHat (the system itself is a Redhat 3.0.4 system
with hand upgrades to Redhat 4.0).

I notice the hangs most with Netscape and emacs, probably because
those are the two programs I run the most. The symptom is that I'll be
zipping along and the process will just hang for about a minute.
Looking with ps, I see that the process is in rpc_recv. It always
comes back eventually but it's obnoxious. I'm not sure, but the
problem seems to occur in bursts - won't show up for five or six
hours, then will happen four times in one hour.

And that's all I know. It's very frustrating. How can I find out more
about what's going on? I assume that since the hang is in rpc_recv,
it's some sort of NFS problem. But I've tried hard to minimize my NFS
usage: my emacs autosave and netscape caches are on local disk, for
instance. I can't even tell if the problem is reads or writes, much
less who's at fault.

I don't know much about the NFS server. It's some sort of serious NFS
server, used throughout the building.

Hints on what to look at to find the problem are welcome. Here's my
typical /etc/mtab

/dev/sda2 / ext2 rw 1 1
/dev/sda1 /dos msdos rw 0 0
/dev/sda4 /home ext2 rw 1 2
/dev/proc /proc proc rw 0 0
mas-disk:/mas /mas nfs rw,nosuid,rsize=8192,wsize=8192,addr=18.85.13.106 0 0
hattrick:(pid168) /mas/disks auto intr,rw,port=1023,timeo=8,retrans=110,indirect,map=/usr/local/etc/amd.disks 0 0
mc:/ag2/agents /a/mc/ag2/agents nfs rw,grpid,nosuid,utimeout=600,rsize=8192,wsize=8192 0 0

the only thing that looks funny here is that retrans is awfully high.
I gather that's an amd default.