Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work

From: Peter Zijlstra
Date: Fri Aug 07 2009 - 08:20:34 EST


On Wed, 2009-06-24 at 11:38 +0100, Richard Kennedy wrote:
>
> Signed-off-by: Richard Kennedy <richard@xxxxxxxxxxxxxxx>
> ---
> balance_dirty_pages can overreact and move all of the dirty pages to
> writeback unnecessarily.
>
> balance_dirty_pages makes its decision to throttle based on the number
> of dirty plus writeback pages that are over the calculated limit,so it
> will continue to move pages even when there are plenty of pages in
> writeback and less than the threshold still dirty.
>
> This allows it to overshoot its limits and move all the dirty pages to
> writeback while waiting for the drives to catch up and empty the
> writeback list.
>
> A simple fio test easily demonstrates this problem.
>
> fio --name=f1 --directory=/disk1 --size=2G -rw=write
> --name=f2 --directory=/disk2 --size=1G --rw=write --startdelay=10
>
> The attached graph before.png shows how all pages are moved to writeback
> as the second write starts and the throttling kicks in.
>
> after.png is the same test with the patch applied, which clearly shows
> that it keeps dirty_background_ratio dirty pages in the buffer.
> The values and timings of the graphs are only approximate but are good
> enough to show the behaviour.
>
> This is the simplest fix I could find, but I'm not entirely sure that it
> alone will be enough for all cases. But it certainly is an improvement
> on my desktop machine writing to 2 disks.
>
> Do we need something more for machines with large arrays where
> bdi_threshold * number_of_drives is greater than the dirty_ratio ?
>

> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 7b0dcea..7687879 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -541,8 +541,11 @@ static void balance_dirty_pages(struct address_space *mapping)
> * filesystems (i.e. NFS) in which data may have been
> * written to the server's write cache, but has not yet
> * been flushed to permanent storage.
> + * Only move pages to writeback if this bdi is over its
> + * threshold otherwise wait until the disk writes catch
> + * up.
> */
> - if (bdi_nr_reclaimable) {
> + if (bdi_nr_reclaimable > bdi_thresh) {
> writeback_inodes(&wbc);
> pages_written += write_chunk - wbc.nr_to_write;
> get_dirty_limits(&background_thresh, &dirty_thresh,

OK, so Chris ran into this bit yesterday, complaining that he'd only get
very few write requests and couldn't saturate his IO channel.

Now, since writing out everything once there's something to do sucks for
Richard, but only writing out stuff when we're over the limit sucks for
Chris (since we can only be over the limit a little), the best thing
would be to only write out when we're over the background limit. Since
that is the low watermark we use for throttling it makes sense that we
try to write out when above that.

However, since there's a lack of bdi_background_thresh, and I don't
think introducing one just for this is really justified. How about the
below?

Chris how did this work for you? Richard, does this make things suck for
you again?

---
mm/page-writeback.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 81627eb..92f42d6 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -545,7 +545,7 @@ static void balance_dirty_pages(struct address_space *mapping)
* threshold otherwise wait until the disk writes catch
* up.
*/
- if (bdi_nr_reclaimable > bdi_thresh) {
+ if (bdi_nr_reclaimable > bdi_thresh/2) {
writeback_inodes(&wbc);
pages_written += write_chunk - wbc.nr_to_write;
get_dirty_limits(&background_thresh, &dirty_thresh,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/