Re: Bug in kernel 2.6.31, Slow wb_kupdate writeout

From: Wu Fengguang
Date: Wed Jul 29 2009 - 07:44:01 EST


On Wed, Jul 29, 2009 at 12:15:48AM -0700, Martin Bligh wrote:
> On Tue, Jul 28, 2009 at 2:49 PM, Martin Bligh<mbligh@xxxxxxxxxx> wrote:
> >> An interesting recent-ish change is "writeback: speed up writeback of
> >> big dirty files." ÂWhen I revert the change to __sync_single_inode the
> >> problem appears to go away and background writeout proceeds at disk
> >> speed. ÂInterestingly, that code is in the git commit [2], but not in
> >> the post to LKML. [3] ÂThis is may not be the fix, but it makes this
> >> test behave better.
> >
> > I'm fairly sure this is not fixing the root cause - but putting it at the head
> > rather than the tail of the queue causes the error not to starve wb_kupdate
> > for nearly so long - as long as we keep the queue full, the bug is hidden.
>
> OK, it seems this is the root cause - I wasn't clear why all the pages weren't
> being written back, and thought there was another bug. What happens is
> we go into write_cache_pages, and stuff the disk queue with as much as
> we can put into it, and then inevitably hit the congestion limit.
>
> Then we back out to __sync_single_inode, who says "huh, you didn't manage
> to write your whole slice", and penalizes the poor blameless inode in question
> by putting it back into the penalty box for 30s.
>
> This results in very lumpy I/O writeback at 5s intervals, and very
> poor throughput.

You are right, so let's fix the congestion case. Your analysis would
be perfect changelog :)

> Patch below is inline and probably text munged, but is for RFC only.
> I'll test it
> more thoroughly tomorrow. As for the comment about starving other writes,
> I believe requeue_io moves it from s_io to s_more_io which should at least
> allow some progress of other files.
>
> --- linux-2.6.30/fs/fs-writeback.c.old 2009-07-29 00:08:29.000000000 -0700
> +++ linux-2.6.30/fs/fs-writeback.c 2009-07-29 00:11:28.000000000 -0700
> @@ -322,46 +322,11 @@ __sync_single_inode(struct inode *inode,
> /*
> * We didn't write back all the pages. nfs_writepages()
> * sometimes bales out without doing anything. Redirty
[snip]
> - if (wbc->nr_to_write <= 0) {
> - /*
> - * slice used up: queue for next turn
> - */
> - requeue_io(inode);
> - } else {
> - /*
> - * somehow blocked: retry later
> - */
> - redirty_tail(inode);

Removing this line can be dangerous - we'll probably go into buzy
waiting (I have tried that long long ago).

Chad, can you try this small patch? Thank you.

Thanks,
Fengguang
---
fs/fs-writeback.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- mm.orig/fs/fs-writeback.c
+++ mm/fs/fs-writeback.c
@@ -325,7 +325,8 @@ __sync_single_inode(struct inode *inode,
* soon as the queue becomes uncongested.
*/
inode->i_state |= I_DIRTY_PAGES;
- if (wbc->nr_to_write <= 0) {
+ if (wbc->nr_to_write <= 0 ||
+ wbc->encountered_congestion) {
/*
* slice used up: queue for next turn
*/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/