Re: [RFC] mm/vmscan.c: avoid possible long latency caused by too_many_isolated()

From: Yu Zhao
Date: Tue Apr 27 2021 - 17:53:47 EST


On Sat, Apr 24, 2021 at 6:48 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote:
>
> Yu Zhao <yuzhao@xxxxxxxxxx> writes:
> [snip]
>
> > @@ -2966,13 +2938,20 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
> > /* need some check for avoid more shrink_zone() */
> > }
> >
> > - /* See comment about same check for global reclaim above */
> > - if (zone->zone_pgdat == last_pgdat)
> > - continue;
> > - last_pgdat = zone->zone_pgdat;
> > shrink_node(zone->zone_pgdat, sc);
> > }
> >
> > + if (last_pgdat)
> > + atomic_dec(&last_pgdat->nr_reclaimers);
> > + else if (should_retry) {
> > + /* wait a bit for the reclaimer. */
> > + if (!schedule_timeout_killable(HZ / 10))
>
> Once we reached here, even accidentally, the caller needs to sleep at
> least 100ms. How about use a semaphore for pgdat->nr_reclaimers? Then
> the sleeper can be waken up when the resource is considered enough.

Yeah, that sounds good to me. I guess we will have to wait and see the
test result from Zhengjun.

Thanks.