Re: [PATCH 1/3] mm, reclaim: make should_continue_reclaim perform dryrun detection

From: Vlastimil Babka
Date: Mon Aug 05 2019 - 06:57:58 EST

On 8/5/19 10:42 AM, Vlastimil Babka wrote:
> On 8/3/19 12:39 AM, Mike Kravetz wrote:
>> From: Hillf Danton <hdanton@xxxxxxxx>
>> Address the issue of should_continue_reclaim continuing true too often
>> for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned.
>> This could happen during hugetlb page allocation causing stalls for
>> minutes or hours.
>> We can stop reclaiming pages if compaction reports it can make a progress.
>> A code reshuffle is needed to do that.
>> And it has side-effects, however,
>> with allocation latencies in other cases but that would come at the cost
>> of potential premature reclaim which has consequences of itself.
> Based on Mel's longer explanation, can we clarify the wording here? e.g.:
> There might be side-effect for other high-order allocations that would
> potentially benefit from more reclaim before compaction for them to be
> faster and less likely to stall, but the consequences of
> premature/over-reclaim are considered worse.
>> We can also bail out of reclaiming pages if we know that there are not
>> enough inactive lru pages left to satisfy the costly allocation.
>> We can give up reclaiming pages too if we see dryrun occur, with the
>> certainty of plenty of inactive pages. IOW with dryrun detected, we are
>> sure we have reclaimed as many pages as we could.
>> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
>> Cc: Mel Gorman <mgorman@xxxxxxx>
>> Cc: Michal Hocko <mhocko@xxxxxxxxxx>
>> Cc: Vlastimil Babka <vbabka@xxxxxxx>
>> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
>> Signed-off-by: Hillf Danton <hdanton@xxxxxxxx>
>> Tested-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
>> Acked-by: Mel Gorman <mgorman@xxxxxxx>
> Acked-by: Vlastimil Babka <vbabka@xxxxxxx>
> I will send some followup cleanup.

How about this?