Re: [PATCH] mm: fix hanging shrinker management on long do_shrink_slab

From: Dave Chinner
Date: Mon Dec 09 2019 - 20:20:54 EST

On Fri, Dec 06, 2019 at 09:11:25AM -0800, Shakeel Butt wrote:
> On Thu, Dec 5, 2019 at 6:10 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > If a shrinker is blocking for a long time, then we need to
> > work to fix the shrinker implementation because blocking is a much
> > bigger problem than just register/unregister.
> >
> Yes, we should be fixing the implementations of all shrinkers and yes
> it is bigger issue but we can also fix register/unregister isolation
> issue in parallel. Fixing all shrinkers would a tedious and long task
> and we should not block fixing isolation issue on it.

"fixing all shrinkers" is a bit of hyperbole - you've identified
only one instance where blocking is causing you problems. Indeed,
most shrinkers are already non-blocking and won't cause you any
problems at all.

> > IOWs, we already know that cycling a global rwsem on every
> > individual shrinker invocation is going to cause noticable
> > scalability problems. Hence I don't think that this sort of "cycle
> > the global rwsem faster to reduce [un]register latency" solution is
> > going to fly because of the runtime performance regressions it will
> > introduce....
> >
> I agree with your scalability concern (though others would argue to
> first demonstrate the issue before adding more sophisticated scalable
> code).

Look at the git history. We *know* this is a problem, so anyone
arguing that we have to prove it can go take a long walk of a short

> Most memory reclaim code is written without the performance or
> scalability concern, maybe we should switch our thinking.

I think there's a lot of core mm and other developers that would
disagree with you there. With respect to shrinkers, we've been
directly concerned about performance and scalability of the
individual instances as well as the infrastructure for at least the
last decade....


Dave Chinner