Re: [PATCH] mm: fix condition for throttle_direct_reclaim
From: Shakeel Butt
Date: Mon Mar 13 2017 - 11:07:43 EST
On Mon, Mar 13, 2017 at 2:02 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> On Fri 10-03-17 11:46:20, Shakeel Butt wrote:
>> Recently kswapd has been modified to give up after MAX_RECLAIM_RETRIES
>> number of unsucessful iterations. Before going to sleep, kswapd thread
>> will unconditionally wakeup all threads sleeping on pfmemalloc_wait.
>> However the awoken threads will recheck the watermarks and wake the
>> kswapd thread and sleep again on pfmemalloc_wait. There is a chance
>> of continuous back and forth between kswapd and direct reclaiming
>> threads if the kswapd keep failing and thus defeat the purpose of
>> adding backoff mechanism to kswapd. So, add kswapd_failures check
>> on the throttle_direct_reclaim condition.
>
> I have to say I really do not like this. kswapd_failures shouldn't
> really be checked outside of the kswapd context. The
> pfmemalloc_watermark_ok/throttle_direct_reclaim is quite complex even
> without putting another variable into it. I wish we rather replace this
> throttling by something else. Johannes had an idea to throttle by the
> number of reclaimers.
>
Do you suspect race in accessing kswapd_failures in non-kswapd
context? Please do let me know more about replacing this throttling.
> Anyway, I am wondering whether we can hit this issue in
> practice? Have you seen it happening or is this a result of the code
> review? I would assume that that !zone_reclaimable_pages check in
> pfmemalloc_watermark_ok should help to some degree.
>
Yes, I have seen this issue going on for more than one hour on my
test. It was a simple test where the number of processes, in the
presence of swap, try to allocate memory more than RAM. The number of
processes are equal to the number of cores and are pinned to each
individual core. I am suspecting that !zone_reclaimable_pages() check
did not help.
>> Signed-off-by: Shakeel Butt <shakeelb@xxxxxxxxxx>
>> ---
>> mm/vmscan.c | 12 +++++++++---
>> 1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index bae698484e8e..b2d24cc7a161 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -2819,6 +2819,12 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
>> return wmark_ok;
>> }
>>
>> +static bool should_throttle_direct_reclaim(pg_data_t *pgdat)
>> +{
>> + return (pgdat->kswapd_failures < MAX_RECLAIM_RETRIES &&
>> + !pfmemalloc_watermark_ok(pgdat));
>> +}
>> +
>> /*
>> * Throttle direct reclaimers if backing storage is backed by the network
>> * and the PFMEMALLOC reserve for the preferred node is getting dangerously
>> @@ -2873,7 +2879,7 @@ static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
>>
>> /* Throttle based on the first usable node */
>> pgdat = zone->zone_pgdat;
>> - if (pfmemalloc_watermark_ok(pgdat))
>> + if (!should_throttle_direct_reclaim(pgdat))
>> goto out;
>> break;
>> }
>> @@ -2895,14 +2901,14 @@ static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
>> */
>> if (!(gfp_mask & __GFP_FS)) {
>> wait_event_interruptible_timeout(pgdat->pfmemalloc_wait,
>> - pfmemalloc_watermark_ok(pgdat), HZ);
>> + !should_throttle_direct_reclaim(pgdat), HZ);
>>
>> goto check_pending;
>> }
>>
>> /* Throttle until kswapd wakes the process */
>> wait_event_killable(zone->zone_pgdat->pfmemalloc_wait,
>> - pfmemalloc_watermark_ok(pgdat));
>> + !should_throttle_direct_reclaim(pgdat));
>>
>> check_pending:
>> if (fatal_signal_pending(current))
>> --
>> 2.12.0.246.ga2ecc84866-goog
>>
>
> --
> Michal Hocko
> SUSE Labs