Unfortunately I'm seeing stability problems using a large number of client
systems with knfs-980922. This is a cluster with 40 diskless machines... after
about 17 of them boot, the nfs server stops responding. Each client needs
two mounts. Originally I was running with 5 nfsds which I increased to 16
but it didn't help.
The server kernel is 2.1.122ac2 + knfsd-980922 (applied both patches). I
have nfsd compiled into the kernel rather than as a module. Clients are
running 2.0.35 and 2.1.117. I didn't see the problem with knfs-0.4.22 +
2.1.11x or knfs-980915 + 2.1.122 (didn't try 980920).
In the server kernel log:
Sep 24 10:18:03 lord kernel: Socket destroy delayed (r=65184 w=0)
Sep 24 10:18:35 lord last message repeated 10 times
Sep 24 10:19:37 lord last message repeated 19 times
Sep 24 10:19:57 lord last message repeated 6 times
On clients already booted:
RPC: task timed out
nfs: server 139.88.154.100 not responding, still trying
On clients attempting to boot:
Root-NFS: Got file handle for /droot/page08 via RPC
NFS server 139.88.154.100 not responding, still trying.
/etc/exports:
/raid\
*.as.lerc.nasa.gov(rw,async,insecure,no_root_squash)\
page*.lerc.nasa.gov(rw,async,insecure,no_root_squash)\
aeroshark.lerc.nasa.gov(rw,async,insecure,no_root_squash)\
hpcc.lerc.nasa.gov(rw,async,insecure,no_root_squash)\
anduin.lerc.nasa.gov(rw,async,insecure,no_root_squash)
/droot\
*.as.lerc.nasa.gov(rw,async,insecure,no_root_squash)\
page*.lerc.nasa.gov(rw,async,insecure,no_root_squash)
Output of kexportfs:
/droot grunt26.as.lerc.nasa.gov
/droot grunt25.as.lerc.nasa.gov
/droot grunt08.as.lerc.nasa.gov
/droot grunt07.as.lerc.nasa.gov
/droot grunt24.as.lerc.nasa.gov
/droot grunt06.as.lerc.nasa.gov
/droot grunt09.as.lerc.nasa.gov
/droot page05.lerc.nasa.gov
/droot page04.lerc.nasa.gov
/droot page03.lerc.nasa.gov
/droot grunt05.as.lerc.nasa.gov
/droot grunt04.as.lerc.nasa.gov
/droot grunt23.as.lerc.nasa.gov
/droot grunt22.as.lerc.nasa.gov
/droot page02.lerc.nasa.gov
/droot grunt03.as.lerc.nasa.gov
/droot grunt02.as.lerc.nasa.gov
/droot grunt21.as.lerc.nasa.gov
/droot page01.lerc.nasa.gov
/droot grunt01.as.lerc.nasa.gov
/droot page08.lerc.nasa.gov
/droot page07.lerc.nasa.gov
/droot page06.lerc.nasa.gov
/raid grunt26.as.lerc.nasa.gov
/raid grunt25.as.lerc.nasa.gov
/raid grunt08.as.lerc.nasa.gov
/raid grunt07.as.lerc.nasa.gov
/raid grunt24.as.lerc.nasa.gov
/raid grunt06.as.lerc.nasa.gov
/raid grunt05.as.lerc.nasa.gov
/raid grunt09.as.lerc.nasa.gov
/raid page04.lerc.nasa.gov
/raid grunt04.as.lerc.nasa.gov
/raid page03.lerc.nasa.gov
/raid grunt23.as.lerc.nasa.gov
/raid grunt22.as.lerc.nasa.gov
/raid page02.lerc.nasa.gov
/raid grunt03.as.lerc.nasa.gov
/raid grunt02.as.lerc.nasa.gov
/raid grunt21.as.lerc.nasa.gov
/raid page01.lerc.nasa.gov
/raid grunt01.as.lerc.nasa.gov
/raid anduin.lerc.nasa.gov
/raid hpcc.lerc.nasa.gov
/raid aeroshark.lerc.nasa.gov
/droot page*.lerc.nasa.gov
/droot *.as.lerc.nasa.gov
/raid page*.lerc.nasa.gov
/raid *.as.lerc.nasa.gov
-- ------------------------------------------------------------------------- Thaddeus J. Kollar Sterling Software, Scientific Systems Division NASA Lewis Research Center -------------------------------------------------------------------------- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/