Re: [PATCH RESEND] Minimal fix for private_list handling races

From: Nick Piggin
Date: Tue Jan 22 2008 - 20:00:47 EST


On Wednesday 23 January 2008 04:10, Jan Kara wrote:
> Hi,
>
> as I got no answer for a week, I'm resending this fix for races in
> private_list handling. Andrew, do you like them more than the previous
> version?

FWIW, I reviewed this, and it looks OK although I think some comments
would be in order.

What would be really nice is to avoid the use of b_assoc_buffers
completely in this function like I've attempted (untested). I don't
know if you'd actually call that an improvement...?

Couple of things I noticed while looking at this code.

- What is osync_buffers_list supposed to do? I couldn't actually
work it out. Why do we care about waiting for these buffers on
here that were added while waiting for writeout of other buffers
to finish? Can we just remove it completely? I must be missing
something.

- What are the get_bh(bh) things supposed to do? Protect the lifetime
of a given bh while "lock" is dropped? That's nice, ignoring the
fact that we brelse(bh) *before* taking the lock again... but isn't
every single other buffer that we _have't_ elevated its reference
exposed to exactly the same lifetime problem? IOW, either it is not
required at all, or it is required for _all_ buffers? (my patch
should fix this).

Hmm, now I remember why I rewrote this file :P
Index: linux-2.6/fs/buffer.c
===================================================================
--- linux-2.6.orig/fs/buffer.c
+++ linux-2.6/fs/buffer.c
@@ -792,47 +792,53 @@ EXPORT_SYMBOL(__set_page_dirty_buffers);
*/
static int fsync_buffers_list(spinlock_t *lock, struct list_head *list)
{
+ struct buffer_head *batch[16];
+ int i, idx, done;
struct buffer_head *bh;
- struct list_head tmp;
int err = 0, err2;

- INIT_LIST_HEAD(&tmp);
-
+again:
spin_lock(lock);
+ idx = 0;
while (!list_empty(list)) {
bh = BH_ENTRY(list->next);
__remove_assoc_queue(bh);
if (buffer_dirty(bh) || buffer_locked(bh)) {
- list_add(&bh->b_assoc_buffers, &tmp);
- if (buffer_dirty(bh)) {
- get_bh(bh);
- spin_unlock(lock);
- /*
- * Ensure any pending I/O completes so that
- * ll_rw_block() actually writes the current
- * contents - it is a noop if I/O is still in
- * flight on potentially older contents.
- */
- ll_rw_block(SWRITE, 1, &bh);
- brelse(bh);
- spin_lock(lock);
- }
+ batch[idx++] = bh;
+ get_bh(bh);
}
+
+ if (idx == 16)
+ break;
}
+ done = list_empty(list);
+ spin_unlock(lock);

- while (!list_empty(&tmp)) {
- bh = BH_ENTRY(tmp.prev);
- list_del_init(&bh->b_assoc_buffers);
- get_bh(bh);
- spin_unlock(lock);
+ for (i = 0; i < idx; i++) {
+ bh = batch[i];
+ if (buffer_dirty(bh)) {
+ /*
+ * Ensure any pending I/O completes so
+ * that ll_rw_block() actually writes
+ * the current contents - it is a noop
+ * if I/O is still in flight on
+ * potentially older contents.
+ */
+ ll_rw_block(SWRITE, 1, &bh);
+ }
+ }
+ for (i = 0; i < idx; i++) {
+ bh = batch[i];
wait_on_buffer(bh);
if (!buffer_uptodate(bh))
err = -EIO;
brelse(bh);
- spin_lock(lock);
}
+
+ idx = 0;
+ if (!done)
+ goto again;

- spin_unlock(lock);
err2 = osync_buffers_list(lock, list);
if (err)
return err;