[replace-alexv-buffer.c-patch] Re: [PATCH] Several bad bugs in fs/*

Andrea Arcangeli (andrea@e-mind.com)
Thu, 22 Apr 1999 21:16:31 +0200 (CEST)


On Thu, 22 Apr 1999, Alexander Viro wrote:

>* race in buffer cache. Scenario:
> 1. we umount filesystem. Suppose that it restores the blocksize upon
> exit (FAT-derived ones and sysvfs do it). set_blocksize() is called.
> Notice that we are likely to have a lot of dirty buffers at that
> moment. set_blocksize() will wait on them, unhash them and leave them
> on the dirty list.
> 2. Suppose that we remount the fs immediately. set_blocksize() is
> called again and buffers are untouched, since they have the right
> size. They remain unhashed and on the dirty list.
> 3. Suppose that the same blocks are bread() again. We got two copies -
> new one (hashed) and old (unhashed).
> 4. Only now bdflush() wakes up. Finds clean buffer on the dirty list.
> Does refile_buffer() on it. Which places it on the clean list and
> rehashes the sucker. Happy, happy, joy, joy. Especially since the next
> bread() will find the older copy (it's placed into the head of the
> hash chain).
> I've actually hit this race testing new VFAT code (Gordon's testsuite
> does a lot of fast dirtify/umount/mount/dirtify passes), but the same
> thing happens with the old implementation of FAT-derived filesystems
> and with sysvfs. Fix: do remove_from_queues() instead of
> remove_from_hash_queue().

I think there's another subtle bug in set_blocksize() and
invalidate_buffers(). The problem is that buffers can be moved across
lists and can be moved across the same list while we are sleeping on a
locked buffer. This patch address the problem. I never run it on a clean
2.2.6, but I am doing these things in my sligthly rewritten buffer.c. It
address also the problem described by Alexander in a safer way because
performances after set_blocksize run, are really not an issue, so we can
be lazy.

My patch looks at least safe to me, but probably is better that somebody
will try out it before inclusion (maybe a simple pre-patch will be ok).

Ah, and it fix also the buffer-dirty-in-freelist leakage.

Index: buffer.c
===================================================================
RCS file: /var/cvs/linux/fs/buffer.c,v
retrieving revision 1.1.1.8
diff -u -r1.1.1.8 buffer.c
--- buffer.c 1999/03/29 21:38:51 1.1.1.8
+++ linux/fs/buffer.c 1999/04/22 19:12:49
@@ -402,11 +402,14 @@
struct buffer_head * bh;

for(nlist = 0; nlist < NR_LIST; nlist++) {
+ refiled:
bh = lru_list[nlist];
for (i = nr_buffers_type[nlist]*2 ; --i > 0 ; bh = bh->b_next_free) {
if (bh->b_dev != dev)
continue;
wait_on_buffer(bh);
+ if (bh->b_list != nlist)
+ goto refiled;
if (bh->b_dev != dev)
continue;
if (bh->b_count)
@@ -486,33 +489,6 @@
remove_from_lru_list(bh);
}

-static inline void put_last_lru(struct buffer_head * bh)
-{
- if (bh) {
- struct buffer_head **bhp = &lru_list[bh->b_list];
-
- if (bh == *bhp) {
- *bhp = bh->b_next_free;
- return;
- }
-
- if(bh->b_dev == B_FREE)
- panic("Wrong block for lru list");
-
- /* Add to back of free list. */
- remove_from_lru_list(bh);
- if(!*bhp) {
- *bhp = bh;
- (*bhp)->b_prev_free = bh;
- }
-
- bh->b_next_free = *bhp;
- bh->b_prev_free = (*bhp)->b_prev_free;
- (*bhp)->b_prev_free->b_next_free = bh;
- (*bhp)->b_prev_free = bh;
- }
-}
-
static inline void put_last_free(struct buffer_head * bh)
{
if (bh) {
@@ -653,10 +629,13 @@
* around on the free list, and we can get in a loop if we are not careful.
*/
for(nlist = 0; nlist < NR_LIST; nlist++) {
+ refiled:
bh = lru_list[nlist];
for (i = nr_buffers_type[nlist]*2 ; --i > 0 ; bh = bhnext) {
if(!bh)
break;
+ if (bh->b_list != nlist)
+ goto refiled;

bhnext = bh->b_next_free;
if (bh->b_dev != dev)
@@ -672,7 +651,6 @@
clear_bit(BH_Req, &bh->b_state);
bh->b_flushtime = 0;
}
- remove_from_hash_queue(bh);
}
}
}
@@ -726,8 +704,6 @@
bh = get_hash_table(dev, block, size);
if (bh) {
if (!buffer_dirty(bh)) {
- if (buffer_uptodate(bh))
- put_last_lru(bh);
bh->b_flushtime = 0;
}
return bh;
@@ -854,6 +830,7 @@
return;
}
buf->b_count = 0;
+ buf->b_state = 0;
remove_from_queues(buf);
put_last_free(buf);
}

Comments (Alexander?).

Andrea Arcangeli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/