Re: dm bufio: Reduce dm_bufio_lock contention

From: Michal Hocko
Date: Tue Sep 04 2018 - 12:08:55 EST


On Tue 04-09-18 11:18:44, Mike Snitzer wrote:
> On Tue, Sep 04 2018 at 3:08am -0400,
> Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>
> > On Mon 03-09-18 18:23:17, Mikulas Patocka wrote:
> > >
> > >
> > > On Wed, 1 Aug 2018, jing xia wrote:
> > >
> > > > We reproduced this issue again and found out the root cause.
> > > > dm_bufio_prefetch() with dm_bufio_lock enters the direct reclaim and
> > > > takes a long time to do the soft_limit_reclaim, because of the huge
> > > > number of memory excess of the memcg.
> > > > Then, all the task who do shrink_slab() wait for dm_bufio_lock.
> > > >
> > > > Any suggestions for this?Thanks.
> > >
> > > There's hardly any solution because Michal Hocko refuses to change
> > > __GFP_NORETRY behavior.
> > >
> > > The patches 41c73a49df31151f4ff868f28fe4f129f113fa2c and
> > > d12067f428c037b4575aaeb2be00847fc214c24a could reduce the lock contention
> > > on the dm-bufio lock - the patches don't fix the high CPU consumption
> > > inside the memory allocation, but the kernel code should wait less on the
> > > bufio lock.
> >
> > If you actually looked at the bottom line of the problem then you would
> > quickly find out that dm-bufio lock is the least of the problem with the
> > soft limit reclaim. This is a misfeature which has been merged and we
> > have to live with it. All we can do is to discourage people from using
> > it and use much more saner low limit instead.
> >
> > So please stop this stupid blaming, try to understand the reasoning
> > behind my arguments.
>
> Yes, this bickering isn't productive. Michal, your responses are pretty
> hard to follow. I'm just trying to follow along on what it is you're
> saying should be done. It isn't clear to me.
>
> PLEASE, restate what we should be doing differently. Or what changes
> need to happen outside of DM, etc.

For this particular case I can only recommend to not use the memcg soft
limit. This is guaranteed to stall and there is no way around it because
this is the semantic of the soft limit. Sad, I know.

Regarding other other workloads. AFAIR the problem was due to the
wait_iff_congested in the direct reclaim. And I've been arguing that
special casing __GFP_NORETRY is not a propoer way to handle that case.
We have PF_LESS_THROTTLE to handle cases where the caller cannot be
really throttled because it is a part of the congestion control. I dunno
what happened in that regards since then though.
--
Michal Hocko
SUSE Labs