Re: [PATCH 0/15] Per-bdi writeback flusher threads v10

From: Jens Axboe
Date: Tue Jun 16 2009 - 04:00:54 EST


On Tue, Jun 16 2009, Zhang, Yanmin wrote:
> On Fri, 2009-06-12 at 14:54 +0200, Jens Axboe wrote:
> > Hi,
> >
> > Here's the 10th version of the writeback patches. Changes since v9:
> >
> > - Fix bdi task exit race leaving work on the list, flush it after we
> > know we cannot be found anymore.
> > - Rename flusher tasks from bdi-foo to flush-foo. Should make it more
> > clear to the casual observer.
> > - Fix a problem with the btrfs bdi register patch that would spew
> > warnings for > 1 mounted btrfs file system.
> > - Rebase to current -git, there were some conflicts with the latest work
> > from viro/hch.
> > - Fix a block layer core problem were stacked devices would overwrite
> > the bdi state, causing problems and warning spew.
> > - In bdi_writeback_all(), in the race occurence of a work allocation
> > failure, restart scanning from the beginning. Then we can drop the
> > bdi_lock mutex before diving into bdi specific writeback.
> > - Convert bdi_lock to a spinlock.
> > - Use spin_trylock() in bdi_writeback_all(), if this isn't a data
> > integrity writeback. Debatable, I kind of like it...
> > - Get rid of BDI_CAP_FLUSH_FORKER, just check for match with the
> > default_backing_dev_info.
> > - Fix race in list checking in bdi_forker_task().
> >
> >
> > For ease of patching, I've put the full diff here:
> >
> > http://kernel.dk/writeback-v10.patch
> Jens,
>
> I applied the patch to 2.6.30 and got a confliction. The attachment is
> the patch I ported to 2.6.30. Did I miss anything?
>
>
> With the patch, kernel reports below messages on 2 machines.
>
> INFO: task sync:29984 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> sync D ffff88002805e300 6168 29984 24581
> ffff88022f84b780 0000000000000082 7fffffffffffffff ffff880133dbfe70
> 0000000000000000 ffff88022e2b4c50 ffff88022e2b4fd8 00000001000c7bb8
> ffff88022f513fd0 ffff880133dbfde8 ffff880133dbfec8 ffff88022d5d13c8
> Call Trace:
> [<ffffffff802b69e4>] ? bdi_sched_wait+0x0/0xd
> [<ffffffff80780fde>] ? schedule+0x9/0x1d
> [<ffffffff802b69ed>] ? bdi_sched_wait+0x9/0xd
> [<ffffffff8078158d>] ? __wait_on_bit+0x40/0x6f
> [<ffffffff802b69e4>] ? bdi_sched_wait+0x0/0xd
> [<ffffffff80781628>] ? out_of_line_wait_on_bit+0x6c/0x78
> [<ffffffff8024a426>] ? wake_bit_function+0x0/0x23
> [<ffffffff802b67ac>] ? bdi_writeback_all+0x12a/0x152
> [<ffffffff802b6805>] ? generic_sync_sb_inodes+0x31/0xde
> [<ffffffff802b6935>] ? sync_inodes_sb+0x83/0x88
> [<ffffffff802b6980>] ? __sync_inodes+0x46/0x8f
> [<ffffffff802b94f2>] ? do_sync+0x36/0x5a
> [<ffffffff802b9538>] ? sys_sync+0xe/0x12
> [<ffffffff8020b9ab>] ? system_call_fastpath+0x16/0x1b

I don't think it is your backport, for some reason the v10 missed a
change that I think could solve this race. If not, there's another in
there that I need to look at.

So against your current base, could you try with the below added as
well? The printk() is just so we can see if this triggers for you or
not.

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index b3e80c5..a065da5 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -384,6 +384,15 @@ static int bdi_start_fn(void *ptr)
*/
synchronize_srcu(&bdi->srcu);

+ /*
+ * Flush any pending work. No more can be added, since
+ * the bdi is no longer discoverable.
+ */
+ if (!list_empty(&bdi->work_list)) {
+ printk("bdi: flushing racy work\n");
+ wb_do_writeback(wb);
+ }
+
bdi_put_wb(bdi, wb);
return ret;
}

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/