Re: [PATCH 2/5] writeback: dirty position control

From: Wu Fengguang
Date: Wed Aug 10 2011 - 22:30:22 EST


On Thu, Aug 11, 2011 at 06:34:27AM +0800, Jan Kara wrote:
> On Tue 09-08-11 19:20:27, Peter Zijlstra wrote:
> > On Tue, 2011-08-09 at 12:32 +0200, Peter Zijlstra wrote:
> > > > origin - dirty
> > > > pos_ratio = --------------
> > > > origin - goal
> > >
> > > > which comes from the below [*] control line, so that when (dirty == goal),
> > > > pos_ratio == 1.0:
> > >
> > > OK, so basically you want a linear function for which:
> > >
> > > f(goal) = 1 and has a root somewhere > goal.
> > >
> > > (that one line is much more informative than all your graphs put
> > > together, one can start from there and derive your function)
> > >
> > > That does indeed get you the above function, now what does it mean?
> >
> > So going by:
> >
> > write_bw
> > ref_bw = dirty_ratelimit * pos_ratio * --------
> > dirty_bw
>
> Actually, thinking about these formulas, why do we even bother with
> computing all these factors like write_bw, dirty_bw, pos_ratio, ...
> Couldn't we just have a feedback loop (probably similar to the one
> computing pos_ratio) which will maintain single value - ratelimit? When we
> are getting close to dirty limit, we will scale ratelimit down, when we
> will be getting significantly below dirty limit, we will scale the
> ratelimit up. Because looking at the formulas it seems to me that the net
> effect is the same - pos_ratio basically overrules everything...

Good question. That is actually one of the early approaches I tried.
It somehow worked, however the resulted ratelimit is not only slow
responding, but also oscillating all the time.

This is due to the imperfections

1) pos_ratio at best only provides a "direction" for adjusting the
ratelimit. There is only vague clues that if pos_ratio is small,
the errors in ratelimit should be small.

2) Due to time-lag, the assumptions in (1) about "direction" and
"error size" can be wrong. The ratelimit may already be
over-adjusted when the dirty pages take time to approach the
setpoint. The larger memory, the more time lag, the easier to
overshoot and oscillate.

3) dirty pages are constantly fluctuating around the setpoint,
so is pos_ratio.

With (1) and (2), it's a control system very susceptible to disturbs.
With (3) we get constant disturbs. Well I had very hard time and
played dirty tricks (which you may never want to know ;-) trying to
tradeoff between response time and stableness..

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/