[PATCH 2.6.2-rc3-mm1] DIO read race fix

From: Daniel McNeil
Date: Wed Feb 04 2004 - 20:42:29 EST


I have found (finally) the problem causing DIO reads racing with
buffered writes to see uninitialized data on ext3 file systems
(which is what I have been testing on).

The problem is caused by the changes to __block_write_page_full()
and a race with journaling:

journal_commit_transaction() -> ll_rw_block() -> submit_bh()

ll_rw_block() locks the buffer, clears buffer dirty and calls
submit_bh()

A racing __block_write_full_page() (from ext3_ordered_writepage())

would see that buffer_dirty() is not set because the i/o
is still in flight, so it would not do a bh_submit()

It would SetPageWriteback() and unlock_page() and then
see that no i/o was submitted and call end_page_writeback()
(with the i/o still in flight).

This would allow the DIO code to issue the DIO read while buffer writes
are still in flight. The i/o can be reordered by i/o scheduling and
the DIO can complete BEFORE the writebacks complete. Thus the DIO
sees the old uninitialized data.

Here is a quick hack that fixes it, but I am not sure if this the
proper long term fix.

Thoughts?

Daniel


--- linux-2.6.2-rc3-mm1/fs/buffer.c 2004-02-04 17:12:43.823525259 -0800
+++ linux-2.6.2-rc3-mm1.patch/fs/buffer.c 2004-02-04 17:16:43.033252068 -0800
@@ -1810,6 +1810,7 @@ static int __block_write_full_page(struc

do {
get_bh(bh);
+ wait_on_buffer(bh);
if (buffer_mapped(bh) && buffer_dirty(bh)) {
if (wbc->sync_mode != WB_SYNC_NONE) {
lock_buffer(bh);