Re: 1352 NUL bytes at the end of a page? (was Re: Assertion `s && s->tree' failed: The saga continues.)

From: Steven Cole
Date: Sat May 15 2004 - 20:26:25 EST


On Friday 14 May 2004 09:41 pm, Andrew Morton wrote:
> Lincoln Dale <ltd@xxxxxxxxx> wrote:
> >
> > At 02:53 AM 15/05/2004, Andy Isaacson wrote:
> > >That corruption size really does make me think of network packets, so
> > >I'm tempted to blame it on PPP. Can you find out the MTU of your PPP
> > >link? "ifconfig ppp0" or something like that.
> >
> > 1352 bytes coule be remarkably close to the TCP MSS . . .
> > perhaps there is some interaction with ppp where there is an overrun / lost
> > packet and the TCP window is mistakenly advanced?
>
> Steve, if it's a memory stomp then perhaps CONFIG_DEBUG_PAGEALLOC and
> CONFIG_DEBUG_SLAB might pick it up.
>
> It seems awfully deterministic though.
>
>
Second reply with some interesting developments.

I ran Andy's bk exersisor script on a vendor supplied kernel (2.6.3-4mdk) for
36 iterations, with no failures at all.

I then stopped that test, updated the 2.6-tree to the state at 15:00 MDT today,
compiled with the above two DEBUG options, and rebooted with that new
kernel.

I ran Andy's script again, and it caused a failure right away. Here is a
snipped version of the log:

renumber: can't read SCCS info in "SCCS/s.ChangeSet".
include/asm-x86_64/SCCS/s.i387.h
[list of files snipped]
net/ipv4/SCCS/s.ip_output.c
Your repository should be back to where it was before undo started
We are running a consistency check to verify this.
check passed
Undo failed, repository left locked.
WARNING: deleting orphan file /home/steven/tmp/bk_clone2_0dH5v6
Entire repository is locked by:
RESYNC directory.
ERROR-Unable to lock repository for update.
1 renumber: can't read SCCS info in "RESYNC/SCCS/s.ChangeSet".
bk: takepatch.c:1343: applyCsetPatch: Assertion `s && s->tree' failed.
2 renumber: can't read SCCS info in "RESYNC/SCCS/s.ChangeSet".
bk: takepatch.c:1343: applyCsetPatch: Assertion `s && s->tree' failed.
3

The digits in column 1 above are an iteration count from the testing script.

I control-c'ed out at that point.
For reference, here is Andy's script again:
#!/bin/sh
x=0
while true; do
bk clone -qlr40514130hBbvgP4CvwEVEu27oxm46w testing-2.6 foo
(cd foo; bk pull -q)
rm -rf foo
x=`expr $x + 1`
echo -n "$x "
done

The RESYNC directory in 'foo' does not contain an SCSS directory.
There were no unusual messages in dmesg.

In the spirit of 'rounding up the usual suspects', I'll unset CONFIG_PREEMT
and try again.

Steven
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/