Re: 2.6.4-mm2

From: Chris Mason
Date: Tue Mar 16 2004 - 16:57:16 EST


On Tue, 2004-03-16 at 13:32, Daniel McNeil wrote:
> Andrew,
>
> I re-ran six copies of the direct_read_under test on an 8-proc
> machine last night. All six tests saw uninitialized data.

It is possible to trigger mpage_writepages twice at the same time,
right? Say once from sync_sb_inodes and once from filemap_fdatawrite?
I'm assuming Daniel is hitting the same bug he reported before, a race
between ll_rw_block from ext3 data=ordered and sychronous writeback from
fsync or O_DIRECT.

Picture one proc in mpage_writepages with wbc->sync_mode ==
WBC_SYNC_NONE, and a second proc with sync_mod = WBC_SYNC_ALL. The file
in question has one dirty page and that one page is being written by
kjournald in a data=ordered flush.

The sync none proc gets to a page with a locked buffer being written by
ll_rw_block. It locks the page, calls test_clear_page_dirty, and then
calls writepage.

The sync all proc now calls pagevec_lookup_tag(PAGECACHE_TAG_DIRTY), no
pages are returned, so it returns.

The sync none proc gets to the buffer_locked check in
__block_write_full_page and properly retags the page with
PAGECACHE_TAG_DIRTY, but it's too late. The sync all proc has already
skipped the page.

That's my theory anyway...

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/