2.6.31.6, unresponsiveness and something with nfs

From: Jesper Krogh
Date: Mon Nov 30 2009 - 12:08:03 EST


Hi.

I have a system running 2.6.31.6 that when running a particular process
become "unresponsive". I cannot really tell what it is but the effect is
that logins as ordinary users hangs, when that user has its home on a
remote NFS-server.

so from root "su - localuser" works excellent. But su - user-with-home
on-nfs doesnt.

It is not as if NIS/NFS doesnt work, since i can get a directory-listing
from the NFS-share as root without problems.

But here is the last 10 lines from "strace -f su -
user-with-home-on-nfs" .. it get into an un-interruptible hang.

[pid 24599] close(3) = 0
[pid 24599] open("/etc/localtime", O_RDONLY) = 3
[pid 24599] fstat(3, {st_mode=S_IFREG|0644, st_size=2134, ...}) = 0
[pid 24599] fstat(3, {st_mode=S_IFREG|0644, st_size=2134, ...}) = 0
[pid 24599] mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6b0c5b2000
[pid 24599] read(3,
"TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\6\0\0\0\6\0\0"..., 4096) = 2134
[pid 24599] lseek(3, -1368, SEEK_CUR) = 766
[pid 24599] read(3,
"TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\10\0"..., 4096) = 1368
[pid 24599] close(3) = 0
[pid 24599] munmap(0x7f6b0c5b2000, 4096) = 0
[pid 24599] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2134,
...}) = 0
[pid 24599] fstat(1,

^C^C^C^C


or at least not uninterruptable, because I have a process merging 20,
1.5GB presorted files using "sort -m" from GNU-coreutils.. on an ext4
volume, a few seconds after I kill -9 the sorting process.. all hanging
login continues.. the above process continues(and the system returns to
"normal state"):

{st_mode=S_IFREG|0664, st_size=246138, ...}) = 0
[pid 24599] --- SIGINT (Interrupt) @ 0 (0) ---
Process 24542 resumed
Process 24599 detached
[pid 24542] <... wait4 resumed> 0x7fffa656c5a4, 0, NULL) = ? ERESTARTSYS
(To be restarted)
[pid 24542] --- SIGINT (Interrupt) @ 0 (0) ---

The merging process is on an ext4 volume of 8TB in size. strace of the
sorting process, shows it progresses nicely.

The system is running 2.6.31.6 with
59a252ff8c0f2fa32c896f69d56ae33e641ce7ad reverted as suggested by J.
Bruce Fields, to me it seems unrelated.

Jesper
--
Jesper
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/