Re: 1352 NUL bytes at the end of a page? (was Re: Assertion `s &&s->tree' failed: The saga continues.)

From: Chris Mason
Date: Mon May 17 2004 - 10:23:55 EST


On Mon, 2004-05-17 at 11:02, Linus Torvalds wrote:
> On Mon, 17 May 2004, Larry McVoy wrote:
> >
> > > And at some point earlier in the process you did an fflush(), or somebody
> > > else had written a header of n*PAGE_SIZE + 0x4ff bytes, or something like
> > > that. Since this was the ChangeSet file, I suspect that the "header" is
> > > the checkin-comment section at the beginning, and the "second phase" is
> > > the actual key list thing. You know how you write the ChangeSet file
> > > better than I do.
> >
> > I don't think we flush along the way but let me look. Whoops, you're right,
> > we do. Right where you thought too. But that doesn't explain there being
> > 3 blocks of nulls (there should NEVER be a null in the s.ChangeSet file, we
> > don't compress that, it's always ascii).
>
> No, no, I'm not claiming that _you_ are writing the NUL bytes. I'm
> claiming the kernel has a bug that triggers with non-page-aligned starting
> offsets of writes (because we clear the bytes after "i_size", and we don't
> synchronize those clears sufficiently), and concurrent flushes. There's
> probably some other trigger needed too (CONFIG_PREEMPT being just the
> thing that uncovers the race).
>
> > But the bigger problem is that you are missing the point that I mentioned
> > elsewhere, we are writing to a tmp file, the tmp file is NOT mmapped.
>
> No, the mmap thing was Andrew's theory. My theory is that regular
> "write()" calls can trigger it through the "commit_write()" function.
>
> Of course, my theory also depended on a page unlock happening in a place
> where it didn't actually happen, so the exact details of my theory are
> crap. I'll need to re-think that part.

You've described it correctly for reiserfs though, we unlock the page
too soon. I'll fix the page locking for reiserfs_file_write. Steven,
we need to figure out why you're seeing this on ext3.

The two filesystems don't share much code for the normal write path, and
I don't see how you can trigger this on ext3 without truncate jumping
into the fun.

-chris


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/