nfsd: terminating on error 104 problem

From: Johan van den Dorpe
Date: Wed Feb 25 2004 - 11:57:44 EST


Hi all

We are currently using quite a number of HP DL380 servers within our company that use the 2.4.25 kernel. These are primarily used for heavy NFS access, so we keep a large number of nfsd processes concurrently running. We have noticed over time however that nfsd processes periodically die. From inspection of the system logs, we get numerous entries:

Feb 22 12:25:24 ps29 kernel: nfsd: recvfrom returned errno 104
Feb 22 12:25:24 ps29 kernel: nfsd: terminating on error 104

At the moment we cron a script that counts the number of nfsds and restart rpc.nfsd if they drop below a threshold. Although this is a working solution, it's not ideal and we would really like to get his problem patched up properly.

So from my limited knowledge of the kernel source I can see that "terminating on error 104" corresponds to line 221 of
/usr/src/linux-2.4.25/fs/nfsd/nfssvc.c. So svc_recv on line 191 is obviously returning -104.

I've noticed that in the 2.6.0 kernel there are quite a few changes to nfssvc.c, and I wondered if they dealt with this situation.

In the mean time, are there any quick hacks I could add to nfssvc.c to make it tolerate error -104? Could I safely alter the main request loop to simply continue execution if svc_recv returns this code?

Any help would be much appreciated.

many thanks

--
Johan van den Dorpe
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/