Re: Linux 2.4.33-rc2
From: Willy Tarreau
Date: Thu Jul 06 2006 - 04:24:08 EST
Hi Grant,
On Thu, Jul 06, 2006 at 05:42:17PM +1000, Grant Coady wrote:
> On Wed, 5 Jul 2006 22:51:37 +0200, Willy Tarreau <w@xxxxxx> wrote:
(...)
> >after a full day of stress-test of multiple parallel tar xUf, and ffsb at
> >full CPU load, I could not reproduce the problem on the exact same kernel
> >I first saw it on. So I think I had bad luck and the problem is not related
> >to the vfs_unlink() patch, so unless anyone else reports a problem or tells
> >us why it is right or wrong, it would seem reasonable to keep it as it is
> >in -rc2.
>
> Hi Willy,
>
> Got this with unpatched -rc2, tosh is NFS server, niner is client:
>
> grant@niner:/home/nfstest$ ls -l
> total 228474
> drwxr-xr-x 19 grant wheel 680 2006-03-20 16:53 linux-2.6.16/
> -rw-r--r-- 1 grant wheel 233953280 2006-07-05 18:27 linux-2.6.16.tar
> drwxr-xr-x 19 grant wheel 680 2006-03-20 16:53 linux-2.6.16b/
> grant@niner:/home/nfstest$ x=0; while [ ! $(diff -rq linux-2.6.16 linux-2.6.16b) ]; do ((x++)); echo "trial $x"; rm -rf linux-2.6.16b; mv linux-2.6.16 linux-2.6.16b; tar xf linux-2.6.16.tar; done
> trial 1
> ...
> trial 29
> rm: cannot remove directory `linux-2.6.16b/drivers/cdrom': Directory not empty
> -bash: [: too many arguments
> grant@niner:/home/nfstest$ ls -l
> total 228474
> drwxr-xr-x 19 grant wheel 680 2006-03-20 16:53 linux-2.6.16/
> -rw-r--r-- 1 grant wheel 233953280 2006-07-05 18:27 linux-2.6.16.tar
> drwxr-xr-x 4 grant wheel 104 2006-07-06 11:01 linux-2.6.16b/
> grant@niner:/home/nfstest$ rm -rf linux-2.6.16b/
>
> The 'rm -rf linux-2.6.16b' completed okay, a mystery?
you might have had a '.nfs0000*' file inthe directory which prevented rmmod
from working, but it was finally removed by the rm -rf.
> This is with two slow (500MHz) boxen with -rc2.
> Only idea I get from logs is during the test:
>
> Jul 5 19:01:19 niner kernel: nfs: server tosh not responding, still trying
> Jul 5 19:01:19 niner kernel: nfs: server tosh OK
>
> ... about one pair each 2 to 5 mins
>
> Jul 6 11:16:08 niner kernel: nfs: server tosh not responding, still trying
> Jul 6 11:16:08 niner kernel: nfs: server tosh OK
> Jul 6 11:26:57 niner -- MARK --
> Jul 6 11:46:57 niner -- MARK --
I get this if the server spends too much time writing data back to the disks.
Doing this on the server fixed the problem for me :
# echo 50 25000 0 0 100 100 60 45 0 >/proc/sys/vm/bdflush
> Other pair of boxen with patched -rc2 completed 146 trials overnight along
> with compiling 2.4 kernel over NFS as well since morning, 64 completed.
> No 'server not responding messages' logged.
Was it on the same server and while other clients saw the server disappear ?
> I'll change the two running boxen to straight -rc2 and see if catch
> anything.
OK, similarly, it might be interesting to apply your patch to niner to see
if the rmmod error happens again.
> Grant.
Thanks for your tests,
Willy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/