Re: [dm-devel] [RFC PATCH 2/2] mm, mempool: do not throttle PF_LESS_THROTTLE tasks

From: Michal Hocko
Date: Wed Jul 27 2016 - 14:40:28 EST


On Wed 27-07-16 10:28:40, Mikulas Patocka wrote:
>
>
> On Wed, 27 Jul 2016, NeilBrown wrote:
>
> > On Tue, Jul 26 2016, Mikulas Patocka wrote:
> >
> > > On Sat, 23 Jul 2016, NeilBrown wrote:
> > >
> > >> "dirtying ... from the reclaim context" ??? What does that mean?
> > >> According to
> > >> Commit: 26eecbf3543b ("[PATCH] vm: pageout throttling")
> > >> From the history tree, the purpose of throttle_vm_writeout() is to
> > >> limit the amount of memory that is concurrently under I/O.
> > >> That seems strange to me because I thought it was the responsibility of
> > >> each backing device to impose a limit - a maximum queue size of some
> > >> sort.
> > >
> > > Device mapper doesn't impose any limit for in-flight bios.
> >
> > I would suggest that it probably should. At least it should
> > "set_wb_congested()" when the number of in-flight bios reaches some
> > arbitrary threshold.
>
> If we set the device mapper device as congested, it can again trigger that
> mempool alloc throttling bug.
>
> I.e. suppose that we swap to a dm-crypt device. The dm-crypt device
> becomes clogged and sets its state as congested. The underlying block
> device is not congested.
>
> The mempool_alloc function in the dm-crypt workqueue sets the
> PF_LESS_THROTTLE flag, and tries to allocate memory, but according to
> Michal's patches, processes with PF_LESS_THROTTLE may still get throttled.
>
> So if we set the dm-crypt device as congested, it can incorrectly throttle
> the dm-crypt workqueue that does allocations of temporary pages and
> encryption.
>
> I think that approach with PF_LESS_THROTTLE in mempool_alloc is incorrect
> and that mempool allocations should never be throttled.

I'm not really sure this is the right approach. If a particular mempool
user cannot ever be throttled by the page allocator then it should
perform GFP_NOWAIT. Even mempool allocations shouldn't allow reclaim to
scan pages too quickly even when LRU lists are full of dirty pages. But
as I've said that would restrict the success rates even under light page
cache load. Throttling on the wait_iff_congested should be quite rare.

Anyway do you see an excessive throttling with the patch posted
http://lkml.kernel.org/r/20160725192344.GD2166@xxxxxxxxxxxxxx ? Or from
another side. Do you see an excessive number of dirty/writeback pages
wrt. the dirty threshold or any other undesirable side effects?
--
Michal Hocko
SUSE Labs