Re: [RFC PATCH] cgroup: introduce dynamic protection for memcg

From: Zhaoyang Huang
Date: Thu Apr 07 2022 - 05:01:47 EST


On Thu, Apr 7, 2022 at 3:40 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Wed 06-04-22 10:11:19, Zhaoyang Huang wrote:
> > On Tue, Apr 5, 2022 at 8:08 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
> > >
> > > On Mon 04-04-22 21:14:40, Zhaoyang Huang wrote:
> > > [...]
> > > > Please be noticed that this patch DOES protect the memcg when external
> > > > pressure is 1GB as fixed low does.
> > >
> > > This is getting more and more confusing (at least to me). Could you
> > > describe the behavior of the reclaim for the following setups/situations?
> > >
> > > a) mostly reclaiming a clean page cache - via kswapd
> > > b) same as above but the direct reclaim is necessary but very
> > > lightweight
> > > c) direct reclaim makes fwd progress but not enough to satisfy the
> > > allocation request (so the reclaim has to be retried)
> > > d) direct reclaim not making progress and low limit protection is
> > > ignored.
> > >
> > > Say we have several memcgs and only some have low memory protection
> > > configured. What is the user observable state of the protected group and
> > > when and how much the protection can be updated?
> > I am not sure if I understand you right. Do you have suspicions on the
> > test result as you think protected memcg has no chance to update the
> > protection or the global reclaim should have been satisfied with the
> > reclaiming(step d is hard to reach?). Let me try to answer it under my
> > understanding, please give me feedback if you need more info. The
> > protection is updated while mem_cgroup_calculate_protection is called
> > during either kswapd or direct reclaim for each round of the priority
> > reclaiming and then the memcg's lruvec will be reached in step d.
>
> This means that limits are altered even if there is memory to be
> reclaimed from other memcgs. Why? How does this line up with the
> basic property of the low limit to act as a protection from the reclaim?
ok, partially understand. I would like to say that low's original
definition under this patch has changed, says the calculated low just
provide protection when the psi value is lower than the setting and
will introduce reclaiming if it exceed. It also can be seen from the
bellowing latest test result(same as previous test but without mlock),
which says that the memcg with fixed low will push back the reclaim to
global LRU while keeping psi to be high. Please be noticed that the
low will be updated when usage raise up over it which means resume the
protection again when the memcg become active.

psi(global=1GB) max stable
psi(global=2GB) max stable
Low=400MB some=18 full=11 700MB 600MB some=20 full=16
400MB 400MB
Low=500MB some=18 full=13 680MB 540MB some=27 full=17
500MB 500MB
patch setting1 some=19 full=13 863MB 740MB some=15
full=10 500MB 500MB
patch setting1 some=14 full=11 640MB 470MB some=20
full=12 360MB 320MB

>
> > > I think it would be also helpful to describe the high level semantic of
> > > this feature.
>
> Please focus on this part. Without a high level semantic explained we
> will not move forward.
> --
> Michal Hocko
> SUSE Labs