NFS wait_on_page problem

John E. Davis (davis@space.mit.edu)
29 Jan 1997 23:44:56 GMT


Here is a typescript that shows the NFS wait_on_page problem.
Essentually the process (pid = 2959) locked up and I was unable to
kill it until sometime after it locked up. At the very least,
shouldn't wait_on_page be modifed so that it does not perform a busy
loop?

Script started on Wed Jan 29 18:35:07 1997
[6:35pm] /tmp>ps -el
FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME COMMAND
100 644 2498 1 0 0 1320 788 sigsuspend S 1 0:00 -tcsh HOM
40 644 2806 1 0 0 820 232 sigsuspend S p0 0:00 /nfs/wiwa
100000 644 2782 2498 0 0 1148 552 wait4 S 1 0:00 sh /usr/b
0 644 2783 2782 0 0 1392 316 wait4 S 1 0:00 xinit /nf
0 644 2786 2783 0 0 1148 560 wait4 S 1 0:00 sh /nfs/w
100000 644 2790 2786 2 0 2088 1400 do_select S 1 0:02 rxvt -slC
100000 644 2792 2786 0 0 3152 1412 do_select S 1 0:00 xload -ju
100000 644 2793 2786 0 0 1960 800 do_select S 1 0:00 xclock -g
0 644 2795 2786 0 0 1904 644 do_select S 1 0:00 /usr/bin/
0 644 2797 2790 3 0 1336 780 sigsuspend S p1 0:00 -csh COLO
100 644 2796 2791 0 0 1340 784 sigsuspend S p0 0:00 -tcsh COL
100 644 2838 2796 0 0 852 416 do_signal T p0 0:00 rlogin wi
100000 644 2909 2796 0 0 1900 1208 do_signal T p0 0:00 jed HOSTN
100000 644 2959 2796 0 0 872 284 wait_on_pag D p0 0:01 rgrep -ri
0 644 2936 2797 0 0 6380 4096 do_select S p1 0:06 netscapeC
100000 644 2967 2797 0 0 1552 828 do_signal T p1 0:00 jed -f rm
100000 644 2971 2797 4 0 840 208 read_chan S p1 0:00 script CO
40 644 2839 2838 0 0 852 420 do_signal T p0 0:00 rlogin wi
100040 644 2972 2971 5 0 852 228 read_chan S p1 0:00 script CO
0 644 2973 2972 7 0 1324 736 sigsuspend S p2 0:00 -csh -i C
100000 644 2975 2973 12 0 904 456 R p2 0:00 ps -el CO
[6:35pm] /tmp>kill -9 2959
[6:35pm] /tmp>ps -el
FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME COMMAND
100 644 2498 1 0 0 1320 788 sigsuspend S 1 0:00 -tcsh HOM
40 644 2806 1 0 0 820 232 sigsuspend S p0 0:00 /nfs/wiwa
100000 644 2782 2498 0 0 1148 552 wait4 S 1 0:00 sh /usr/b
0 644 2783 2782 0 0 1392 316 wait4 S 1 0:00 xinit /nf
0 644 2786 2783 0 0 1148 560 wait4 S 1 0:00 sh /nfs/w
100000 644 2790 2786 10 0 2088 1400 do_select S 1 0:03 rxvt -slC
100000 644 2792 2786 0 0 3152 1412 do_select S 1 0:00 xload -ju
100000 644 2793 2786 0 0 1960 800 do_select S 1 0:00 xclock -g
0 644 2795 2786 0 0 1904 644 do_select S 1 0:00 /usr/bin/
0 644 2797 2790 3 0 1336 780 sigsuspend S p1 0:00 -csh COLO
100 644 2796 2791 0 0 1340 784 sigsuspend S p0 0:00 -tcsh COL
100 644 2838 2796 0 0 852 416 do_signal T p0 0:00 rlogin wi
100000 644 2909 2796 0 0 1900 1208 do_signal T p0 0:00 jed HOSTN
100000 644 2959 2796 0 0 872 284 wait_on_pag D p0 0:01 rgrep -ri
0 644 2936 2797 0 0 6380 4096 do_select S p1 0:06 netscapeC
100000 644 2967 2797 0 0 1552 828 do_signal T p1 0:00 jed -f rm
100000 644 2971 2797 4 0 840 208 read_chan S p1 0:00 script CO
40 644 2839 2838 0 0 852 420 do_signal T p0 0:00 rlogin wi
100040 644 2972 2971 5 0 852 228 read_chan S p1 0:00 script CO
0 644 2973 2972 7 0 1324 736 sigsuspend S p2 0:00 -csh -i C
100000 644 2976 2973 16 0 904 456 R p2 0:00 ps -el CO
[6:35pm] /tmp>kill -9 2959
[6:35pm] /tmp>kill -9 2959
[6:35pm] /tmp>ps -xl
FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME COMMAND
100 644 2498 1 0 0 1320 788 sigsuspend S 1 0:00 -tcsh
40 644 2806 1 0 0 820 232 sigsuspend S p0 0:00 /nfs/wiwa
100000 644 2782 2498 0 0 1148 552 wait4 S 1 0:00 sh /usr/b
0 644 2783 2782 0 0 1392 316 wait4 S 1 0:00 xinit /nf
0 644 2786 2783 0 0 1148 560 wait4 S 1 0:00 sh /nfs/w
100000 644 2790 2786 10 0 2088 1400 do_select S 1 0:03 rxvt -sl
100000 644 2792 2786 0 0 3152 1412 do_select S 1 0:00 xload -ju
100000 644 2793 2786 0 0 1960 800 do_select S 1 0:00 xclock -g
0 644 2795 2786 0 0 1904 644 do_select S 1 0:00 /usr/bin/
0 644 2797 2790 1 0 1336 780 sigsuspend S p1 0:00 -csh
100 644 2796 2791 0 0 1340 784 sigsuspend S p0 0:00 -tcsh
100 644 2838 2796 0 0 852 416 do_signal T p0 0:00 rlogin wi
100000 644 2909 2796 0 0 1900 1208 do_signal T p0 0:00 jed
100000 644 2959 2796 0 0 872 284 wait_on_pag D p0 0:01 rgrep -ri
0 644 2936 2797 0 0 6380 4096 do_select S p1 0:06 netscape
100000 644 2967 2797 0 0 1552 828 do_signal T p1 0:00 jed -f rm
100000 644 2971 2797 2 0 840 208 read_chan S p1 0:00 script
40 644 2839 2838 0 0 852 420 do_signal T p0 0:00 rlogin wi
100040 644 2972 2971 3 0 852 228 read_chan S p1 0:00 script
0 644 2973 2972 10 0 1324 736 sigsuspend S p2 0:00 -csh -i
100000 644 2977 2973 17 0 884 428 R p2 0:00 ps -xl
[6:35pm] /tmp>kill -9 2959
[6:36pm] /tmp>kill -9 2959
[6:36pm] /tmp>kill -9 2959
[6:36pm] /tmp>kill -9 2959
[6:36pm] /tmp>kill -9 2959
[6:36pm] /tmp>kill -9 2959
[6:36pm] /tmp>kill -9 2959
[6:36pm] /tmp>kill -9 2959
[6:36pm] /tmp>kill -HUP 2959
[6:36pm] /tmp>kill -HUP 2959
[6:36pm] /tmp>kill -HUP 2959
[6:36pm] /tmp>kill -HUP 2959
2959: No such process
[6:36pm] /tmp>exit
exit

Script done on Wed Jan 29 18:36:36 1997

-- 
John E. Davis                   Center for Space Research/AXAF Science Center
617-258-8119                    MIT 37-662c, Cambridge, MA 02139
http://space.mit.edu/~davis