Re: [RFC PATCH] cgroup: introduce dynamic protection for memcg
From: Zhaoyang Huang
Date: Thu Apr 07 2022 - 08:37:08 EST
On Thu, Apr 7, 2022 at 5:44 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> [...]
> On Thu 07-04-22 16:59:50, Zhaoyang Huang wrote:
> > > This means that limits are altered even if there is memory to be
> > > reclaimed from other memcgs. Why? How does this line up with the
> > > basic property of the low limit to act as a protection from the reclaim?
> > ok, partially understand. I would like to say that low's original
> > definition under this patch has changed, says the calculated low just
> > provide protection when the psi value is lower than the setting and
> > will introduce reclaiming if it exceed.
>
> OK, I guess I finally get to understand what you are trying to say. So
> effectivelly your new semantic defines the low limit as an initial
> protection that is very soft and only preserved under a light global
> memory pressure[1]. If the reclaim pressure is higher the user provided
> protection is decreased. The new semantic is planned to be a global
> opt-in.
>
> Correct?
right. But I don't think the original protection is soft which could
be proved by the test result that the memcg is protected in a certain
range of pressure and could also help to release the system by
breaking low limit.
>
> Now, that I (believe) to have a slightly better understanding I have to
> say I really dislike the idea.
> First of all the new semantic would have to be memcg reclaim aware. That
> means that the scaling logic would need to be aware where the memory
> pressure comes from.
I don't follow. Does it mean that the protected should distinguish the
pressure from global and other memcgs? I don't know why.
> More importantnly I really dislike the idea that the user provided input
> is updated by the kernel without userspace knowledge about that. How is
> the userspace supposed to know that the value has been changed?
Actually, the purpose of this patch is to free the userspace during
runtime which require proper setup of parameter and then let the
scheme decide rest things.
> I can see how the effective values could be scaled down but this still
> sounds dubious as the userspace would have hard time to understand what
> is going on under the cover or even worse tune the value based on the
> specific observed behavior for a specific kernel which would make this a
> maintenance burden going forward.
This kind of memcg is supposed to be used by the user who is aware of
the scheme and would like the scheme to perform as it is.
>
> All that being said I have hard time to make sense of a protection which
> is unpredictably decreasing based on a global metrics without any
> userspace view into why and how this is done. So I am afraid I have to
> NACK this and I really do recommend you to start a discussion about your
> specific usecase and try to find a different solution.
As I have mentioned before, EAS scheduler is also a self-motivating
scheme which is based on load tracking and energy calculation. The
user could also be hard to know when the schedule entity could be
scheduled to big core. The admin could turn it off if dislike.
I would like to push this patch forward and get more feedback from
real scenarios.
>
> Best regards
>
>
> [1] this is based on the global PSI metric.
> --
> Michal Hocko
> SUSE Labs