Re: [PATCH v2] mm: terminate shrink_slab loop if signal is pending

From: Michal Hocko
Date: Fri Dec 08 2017 - 06:48:16 EST


On Fri 08-12-17 20:36:16, Tetsuo Handa wrote:
> On 2017/12/08 17:22, Michal Hocko wrote:
> > On Thu 07-12-17 17:23:05, Suren Baghdasaryan wrote:
> >> Slab shrinkers can be quite time consuming and when signal
> >> is pending they can delay handling of the signal. If fatal
> >> signal is pending there is no point in shrinking that process
> >> since it will be killed anyway.
> >
> > The thing is that we are _not_ shrinking _that_ process. We are
> > shrinking globally shared objects and the fact that the memory pressure
> > is so large that the kswapd doesn't keep pace with it means that we have
> > to throttle all allocation sites by doing this direct reclaim. I agree
> > that expediting killed task is a good thing in general because such a
> > process should free at least some memory.
>
> But doesn't doing direct reclaim mean that allocation request of already
> fatal_signal_pending() threads will not succeed unless some memory is
> reclaimed (or selected as an OOM victim)? Won't it just spin the "too
> small to fail" retry loop at full speed in the worst case?

Well, normally kswapd would do the work on the background. But this
would have to be carefully evaluated. That is why I've said "expedite"
rather than skip.

> >> This change checks for pending
> >> fatal signals inside shrink_slab loop and if one is detected
> >> terminates this loop early.
> >
> > This changelog doesn't really address my previous review feedback, I am
> > afraid. You should mention more details about problems you are seeing
> > and what causes them. If we have a shrinker which takes considerable
> > amount of time them we should be addressing that. If that is not
> > possible then it should be documented at least.
>
> Unfortunately, it is possible to be get blocked inside shrink_slab() for so long
> like an example from http://lkml.kernel.org/r/1512705038.7843.6.camel@xxxxxxxxx .

As I've said any excessive shrinker should definitely be evaluated.
--
Michal Hocko
SUSE Labs