Re: [PATCH 2.6.2-rc3-mm1] DIO read race fix

From: Andrew Morton
Date: Wed Feb 04 2004 - 21:09:32 EST


Daniel McNeil <daniel@xxxxxxxx> wrote:
>
> I have found (finally) the problem causing DIO reads racing with
> buffered writes to see uninitialized data on ext3 file systems
> (which is what I have been testing on).

What kernel? If -mm, is this the only remaining buffered-vs-direct
problem?

> The problem is caused by the changes to __block_write_page_full()
> and a race with journaling:
>
> journal_commit_transaction() -> ll_rw_block() -> submit_bh()
>
> ll_rw_block() locks the buffer, clears buffer dirty and calls
> submit_bh()
>
> A racing __block_write_full_page() (from ext3_ordered_writepage())
>
> would see that buffer_dirty() is not set because the i/o
> is still in flight, so it would not do a bh_submit()
>
> It would SetPageWriteback() and unlock_page() and then
> see that no i/o was submitted and call end_page_writeback()
> (with the i/o still in flight).
>
> This would allow the DIO code to issue the DIO read while buffer writes
> are still in flight. The i/o can be reordered by i/o scheduling and
> the DIO can complete BEFORE the writebacks complete. Thus the DIO
> sees the old uninitialized data.

ow. How'd you work this out?

> Here is a quick hack that fixes it, but I am not sure if this the
> proper long term fix.

The problem is that ext3 and the VFS are using different paradigms. VFS is
all page-based, but ext3 is all block-based. One or the other needs to do
something nasty.

One approach would be to change the JBD write_out_data_locked loop to use
block_write_full_page(bh->b_page) instead of buffer_head operations. But
that could get hairy with blocksize < PAGE_SIZE.

Thanks for working this out. Let me ponder it for a bit.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/