Re: Your rename change and loopback

Alexander Viro (viro@math.psu.edu)
Sat, 22 May 1999 13:16:12 -0400 (EDT)


On Sat, 22 May 1999, H.J. Lu wrote:

> > There is one case when I'm *sure* that buffer.c contains a bug able to
> > cause fs corruption. It can be triggered by consequent set_blocksize()
> > calls racing with bdflush. It is fixed in 2.3.2 and patch is fairly
> > trivial. It *might* give the effectes you are describing (especially with
> > fast dirtify/umount/mount/dirtify sequences), but to say whether it's the
>
> My script does have VERY FAST dirtify/umount/mount/dirtify sequences.

Aha. Could you try the patch I've sent in the previous posting?
Nature of the bug: if you have a dirty buffer at moment of set_blocksize()
call it will be flushed to disk, unhashed and left on the dirty list.
Assume that next set_blocksize() will happen soon, followed by bread() on
the same block. New in-core copy will be created and hashed. Furthermore,
assume that you've dirtified it. Now the bdflush wakes up and finds the
old block on dirty list. Block has the right size, is clean, so it must be
moved to appropriate list. Unfortunately it rehashes the thing. Now we
have two copies of the same block in hash, old copy ahead of the new
(modified) one. The rest is obvious... Patch replaces unhashing with
moving to free list.

> I have run many tests. The results are
>
> 1. 2.2.5 is ok.
> 2. 2.2.6 has the bug. buffer.c was not changed between 2.2.5 and 2.2.6.
> 3. 2.2.5 + some rename patch from 2.2.6 has the bug.
> 4. 2.2.5 + your rename-patch-11, which I believe is similar to the
> rename patch from 2.2.6, has the bug.
>
> Everything points to the rename change in 2.2.6. It may not be
> the real cause of the bug. But at least it causes the bug to show up.

Very interesting. rename-11 does finer locking than the old variant, so it
might speed the things up enough to trigger the race. Could you look if
the patch to buffer.c will change the situation? I've hit the same problem
on VFAT testsuite - there it definitely was the aforementioned bug and
patch above had fixed the things.

In case you don't have the old posting at hands:
--- linux-2.2.6/fs/buffer.c Sun Mar 28 14:54:47 1999
+++ linux-bird.misc/fs/buffer.c Sat Apr 10 00:30:01 1999
@@ -672,7 +672,9 @@
clear_bit(BH_Req, &bh->b_state);
bh->b_flushtime = 0;
}
- remove_from_hash_queue(bh);
+ remove_from_queues(bh);
+ bh->b_dev=B_FREE;
+ insert_into_queues(bh);
}
}
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/