Re: [PATCH 0/6] More graceful flusher thread memory reclaim wakeup

From: Johannes Weiner
Date: Wed Sep 20 2017 - 19:11:58 EST


[ Fixed up CC list. John, you're sending email with
From: John Stoffel <john@xxxxxxxxxxxxxxxxx> ]

On Wed, Sep 20, 2017 at 01:32:25PM -0600, Jens Axboe wrote:
> On 09/20/2017 01:29 PM, John Stoffel wrote:
> > On Tue, Sep 19, 2017 at 01:53:01PM -0600, Jens Axboe wrote:
> >> We've had some issues with writeback in presence of memory reclaim
> >> at Facebook, and this patch set attempts to fix it up. The real
> >> functional change is the last patch in the series, the first 5 are
> >> prep and cleanup patches.
> >>
> >> The basic idea is that we have callers that call
> >> wakeup_flusher_threads() with nr_pages == 0. This means 'writeback
> >> everything'. For memory reclaim situations, we can end up queuing
> >> a TON of these kinds of writeback units. This can cause softlockups
> >> and further memory issues, since we allocate huge amounts of
> >> struct wb_writeback_work to handle this writeback. Handle this
> >> situation more gracefully.
> >
> > This looks nice, but do you have any numbers to show how this improves
> > things? I read the patches, but I'm not strong enough to comment on
> > them at all. But I am interested in how this improves writeback under
> > pressure, if at all.
>
> Writeback should be about the same, it's mostly about preventing
> softlockups and excessive memory usage, under conditions where we are
> actively trying to reclaim/clean memory. It was bad enough to cause
> softlockups for writeback work processing, while the pending writeback
> work units grew to insane lengths.

In numbers, we have seen situations where we had 600 million writeback
work items queued up from reclaim under pressure. That's 35G worth of
work descriptors, and the machine was struggling to remain responsive
due to a lack of memory.

Once writeback against all outstanding dirty pages has been requested,
there really isn't a need to queue even a second work item; the job is
already being performed. We can queue the next one when it completes.