Re: linux-2.6.20-rc4-mm1 Reiser4 filesystem freeze and corruption

From: Edward Shishkin
Date: Thu Feb 01 2007 - 10:55:32 EST


Zan Lynx wrote:

On Sat, 2007-01-20 at 03:34 +0300, Vladimir V. Saveliev wrote:


Hello

On Friday 19 January 2007 20:58, Zan Lynx wrote:


I have been running 2.6.20-rc2-mm1 without problems, but both rc3-mm1
and rc4-mm1 have been giving me these freezes. They were happening
inside X and without external console it was impossible to get anything,
plus I was reluctant to test it since the freeze sometimes requires a
full fsck.reiser4 --build-fs to recover the filesystem.

But I finally got some output in a console session. I wasn't able to
get it all, I made some notes of what I think the problem is. I may try
again later once I get netconsole working (netconsole fails as a
built-in, I'll try it as a module next).


[snip]


yes, please provide more information. Full kernel output at time of freeze is very desirable.



Here comes a full sized bug report, as best as I can do it. This is
kernel 2.6.20-rc6-mm3 instead of rc4-mm1. Still has the problem.



Thanks for the dump.

[ 3138.456588] [<ffffffff8033f5de>] current_atom_finish_all_fq+0x12e/0x280
[ 3138.456661] [<ffffffff80296510>] autoremove_wake_function+0x0/0x30
[ 3138.456674] [<ffffffff803350ac>] submit_wb_list+0x11c/0x130
[ 3138.456690] [<ffffffff80335409>] reiser4_txn_end+0x349/0x530
[ 3138.456710] [<ffffffff803355f9>] reiser4_txn_restart+0x9/0x20
[ 3138.456781] [<ffffffff80335680>] force_commit_atom+0x50/0x60
[ 3138.456793] [<ffffffff8034cfb1>] writepages_unix_file+0x671/0x780
[ 3138.456824] [<ffffffff802590b3>] do_writepages+0x43/0x80
[ 3138.456838] [<ffffffff8024dbf8>] __filemap_fdatawrite_range+0x58/0x70
[ 3138.456914] [<ffffffff8024e19d>] do_fsync+0x3d/0xe0
[ 3138.456930] [<ffffffff802c2473>] sys_msync+0x143/0x1f0
[ 3138.456945] [<ffffffff8025c11e>] system_call+0x7e/0x83



This is waiting for IO completion, and no success because of new plugging
policy introduced by block layer folks. The attached patch should help.
Andrew, please apply.

Thanks,
Edward.

Signed-off-by: Edward Shishkin <edward@xxxxxxxxxxx>
---
linux-2.6.20-rc6-mm3/fs/reiser4/status_flags.c | 2 ++
linux-2.6.20-rc6-mm3/fs/reiser4/wander.c | 18 +++++++++++-------
2 files changed, 13 insertions(+), 7 deletions(-)

--- linux-2.6.20-rc6-mm3/fs/reiser4/status_flags.c.orig
+++ linux-2.6.20-rc6-mm3/fs/reiser4/status_flags.c
@@ -63,6 +63,7 @@
}
lock_page(page);
submit_bio(READ, bio);
+ blk_replug_current_nested();
wait_on_page_locked(page);
if (!PageUptodate(page)) {
warning("green-2007",
@@ -157,6 +158,7 @@
lock_page(get_super_private(sb)->status_page); // Safe as nobody should touch our page.
/* We can block now, but we have no other choice anyway */
submit_bio(WRITE, bio);
+ blk_replug_current_nested();
return 0; // We do not wait for io to finish.
}

--- linux-2.6.20-rc6-mm3/fs/reiser4/wander.c.orig
+++ linux-2.6.20-rc6-mm3/fs/reiser4/wander.c
@@ -718,6 +718,7 @@
jnode *first, int nr, const reiser4_block_nr *block_p,
flush_queue_t *fq, int flags)
{
+ int ret = 0;
struct super_block *super = reiser4_get_current_sb();
int write_op = ( flags & WRITEOUT_BARRIER ) ? WRITE_BARRIER : WRITE;
int max_blocks;
@@ -738,9 +739,10 @@
int nr_used;

bio = bio_alloc(GFP_NOIO, nr_blocks);
- if (!bio)
- return RETERR(-ENOMEM);
-
+ if (!bio) {
+ ret = RETERR(-ENOMEM);
+ break;
+ }
bio->bi_bdev = super->s_bdev;
bio->bi_sector = block * (super->s_blocksize >> 9);
for (nr_used = 0, i = 0; i < nr_blocks; i++) {
@@ -843,8 +845,10 @@
reiser4_submit_bio(write_op, bio);
not_supported = bio_flagged(bio, BIO_EOPNOTSUPP);
bio_put(bio);
- if (not_supported)
- return -EOPNOTSUPP;
+ if (not_supported) {
+ ret = -EOPNOTSUPP;
+ break;
+ }
}

block += nr_used - 1;
@@ -855,8 +859,8 @@
}
nr -= nr_used;
}
-
- return 0;
+ blk_replug_current_nested();
+ return ret;
}

/* This is a procedure which recovers a contiguous sequences of disk block