nfs deadlock problem in kernel 2.0.27 ?

Michel LESPINASSE (walken@via.ecp.fr)
Sat, 11 Jan 1997 08:07:04 +0100 (MET)


Sorry, this is a repost.... I mailed this two days ago and apparently it
didn't made up to the list

This is about a process locked in the 2.0.27 "stable" kernel and
unkillable.

PS : the process is still running.... if you want me to collect more
debugging information....

----

I had a problem twice with the 2.0.27 kernel, I think this is a bug in the
nfs filesystem code.

When accessing remote filesystems, it sometimes happens that the reading
process gets locked. I think that it is locked inside the kernel, because
it is impossible to kill -9 it.

It happened to me twice when I used the update command in the debian
deselect package. I did tons of other accesses on this volume, but none of
them triggered this bug.... (I cannot explain this)

The system is still running, but this console is locked and I cannot
unmount this partition because of the open file. Here is what I can see in
another console :

8:31 [root:2] Studio:~# lsof /mnt/debian
COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE NAME
zcat 2685 root 0r REG 0, 2 158525 556423252
/mnt/debian (bouddha:/debian)

[the inode number seems much too big.... is this the reason for the bug
or the consequence of it ?]

8:31 [root:2] Studio:~# lsof -p 2685
COMMAND PID USER FD TYPE DEVICE SIZE/OFF INODE NAME
zcat 2685 root cwd DIR 3, 1 1024 32774 /
(/dev/hda1)
zcat 2685 root rtd DIR 3, 1 1024 2 /
zcat 2685 root mem REG 3, 1 45280 149545 /
(/dev/hda1)
zcat 2685 root mem REG 3, 1 21448 20500 /
(/dev/hda1)
zcat 2685 root mem REG 3, 1 565296 20522 /
(/dev/hda1)
zcat 2685 root 0r REG 0, 2 158525 556423252
/mnt/debian/rex-fixed/binary-i386/Packages.gz
zcat 2685 root 1w REG 3, 1 0 33058 /
(/dev/hda1)
zcat 2685 root 2u CHR 4, 1 0t0 125293
/dev/tty1
zcat 2685 root 3u unix 0x0035d414 0t0
zcat 2685 root 4uw REG 3, 1 0 167965 /
(/dev/hda1)

This nfs partition is mounted at boot-time with this entry in my fstab :
bouddha:/debian /mnt/debian nfs defaults,nodev,noexec,nosuid,rsize=8192,ro
0 2

8:55 [root:2] Studio:~# kill -9 2685
8:55 [root:2] Studio:~#

the zcat process is still running, and in my locked console, I still see
the dselect message :
Uncompressing /mnt/debian/stable/binary-i386/Packages.gz ...

I feel this is strange, because all the accesses I made to this partition
with the usual shell tools never locked anything, but I got the lock
twice with dselect and always at the same place. Still I believe (correct
me if I'm wrong) that the problem cannot be in user space, because then
the kill -9 should work.

I am running the 2.0.27 kernel and bouddha is running Linux 2.0.22
If you want any more precisions, mail me. The process is still running.

Michel "Walken" LESPINASSE - Student at Ecole Centrale Paris (France)
www Email : walken@via.ecp.fr
(o o) VideoLan project : http://videolan.via.ecp.fr/
------oOO--(_)--OOo-------------------------------------------------------
Yow ! 1135 KB/s remote host TCP bandwidth over 10Mb/s ethernet. Beat that!