Re: [PATCH 7/7] fs-writeback: only allow one inflight and pending full flush

From: Christoph Hellwig
Date: Thu Sep 21 2017 - 11:05:17 EST


On Wed, Sep 20, 2017 at 09:33:02AM -0600, Jens Axboe wrote:
> When someone calls wakeup_flusher_threads() or
> wakeup_flusher_threads_bdi(), they schedule writeback of all dirty
> pages in the system (or on that bdi). If we are tight on memory, we
> can get tons of these queued from kswapd/vmscan. This causes (at
> least) two problems:
>
> 1) We consume a ton of memory just allocating writeback work items.
> 2) We spend so much time processing these work items, that we
> introduce a softlockup in writeback processing.
>
> Fix this by adding a 'start_all' bit to the writeback structure, and
> set that when someone attempts to flush all dirty page. The bit is
> cleared when we start writeback on that work item. If the bit is
> already set when we attempt to queue !nr_pages writeback, then we
> simply ignore it.
>
> This provides us one full flush in flight, with one pending as well,
> and makes for more efficient handling of this type of writeback.
>
> Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> Tested-by: Chris Mason <clm@xxxxxx>
> Reviewed-by: Jan Kara <jack@xxxxxxx>
> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
> ---
> fs/fs-writeback.c | 24 ++++++++++++++++++++++++
> include/linux/backing-dev-defs.h | 1 +
> 2 files changed, 25 insertions(+)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 3916ea2484ae..6205319d0c24 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -53,6 +53,7 @@ struct wb_writeback_work {
> unsigned int for_background:1;
> unsigned int for_sync:1; /* sync(2) WB_SYNC_ALL writeback */
> unsigned int auto_free:1; /* free on completion */
> + unsigned int start_all:1; /* nr_pages == 0 (all) writeback */
> enum wb_reason reason; /* why was writeback initiated? */
>
> struct list_head list; /* pending work list */
> @@ -953,12 +954,26 @@ static void wb_start_writeback(struct bdi_writeback *wb, bool range_cyclic,
> return;
>
> /*
> + * All callers of this function want to start writeback of all
> + * dirty pages. Places like vmscan can call this at a very
> + * high frequency, causing pointless allocations of tons of
> + * work items and keeping the flusher threads busy retrieving
> + * that work. Ensure that we only allow one of them pending and
> + * inflight at the time
> + */
> + if (test_bit(WB_start_all, &wb->state))
> + return;
> +
> + set_bit(WB_start_all, &wb->state);

This should be test_and_set_bit here..

But more importantly once we are not guaranteed that we only have
a single global wb_writeback_work per bdi_writeback we should just
embedd that into struct bdi_writeback instead of dynamically
allocating it.