Re: [PATCH 0/5] IO-less dirty throttling v8

From: Vivek Goyal
Date: Mon Aug 08 2011 - 22:02:19 EST

On Sat, Aug 06, 2011 at 04:44:47PM +0800, Wu Fengguang wrote:
> Hi all,
> The _core_ bits of the IO-less balance_dirty_pages().
> Heavily simplified and re-commented to make it easier to review.
> git:// dirty-throttling-v8
> Only the bare minimal algorithms are presented, so you will find some rough
> edges in the graphs below. But it's usable :)
> And an introduction to the (more complete) algorithms:
> Questions and reviews are highly appreciated!

Hi Wu,

I am going through the slide number 39 where you talk about it being
future proof and it can be used for IO control purposes. You have listed
following merits of this approach.

* per-bdi nature, works on NFS and Software RAID
* no delayed response (working at the right layer)
* no page tracking, hence decoupled from memcg
* no interactions with FS and CFQ
* get proportional IO controller for free
* reuse/inherit all the base facilities/functions

I would say that it will also be a good idea to list the demerits of
this approach in current form and that is that it only deals with
controlling buffered write IO and nothing else. So on the same
block device, other direct writes might be going on from same group
and in this scheme a user will not have any control. Another disadvantage
is that throttling at page cache level does not take care of IO
spikes at device level.

Now I think one could probably come up with more sophisticated scheme
where throttling is done at bdi level but is also accounted at device
level at IO controller. (Something similar I had done in the past but
Dave Chinner did not like it).

Anyway, keeping track of per cgroup rate and throttling accordingly
can definitely help implement an algorithm for per cgroup IO control.
We probably just need to find a reasonable way to account all this
IO to end device so that we have control of all kind of IO of a cgroup.

How do you implement proportional control here? From overall bdi bandwidth
vary per cgroup bandwidth regularly based on cgroup weight? Again the
issue here is that it controls only buffered WRITES and nothing else and
in this case co-ordinating with CFQ will probably be hard. So I guess
usage of proportional IO just for buffered WRITES will have limited


> shortlog:
> Wu Fengguang (5):
> writeback: account per-bdi accumulated dirtied pages
> writeback: dirty position control
> writeback: dirty rate control
> writeback: per task dirty rate limit
> writeback: IO-less balance_dirty_pages()
> The last 4 patches are one single logical change, but splitted here to
> make it easier to review the different parts of the algorithm.
> diffstat:
> include/linux/backing-dev.h | 8 +
> include/linux/sched.h | 7 +
> include/trace/events/writeback.h | 24 --
> mm/backing-dev.c | 3 +
> mm/memory_hotplug.c | 3 -
> mm/page-writeback.c | 459 ++++++++++++++++++++++----------------
> 6 files changed, 290 insertions(+), 214 deletions(-)
> Thanks,
> Fengguang
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at