Re: [PATCH v2 1/1] cgroup: make per-cgroup pressure stall tracking configurable
From: Suren Baghdasaryan
Date: Tue May 18 2021 - 14:23:21 EST
On Tue, May 18, 2021 at 11:08 AM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote:
> On Mon, May 17, 2021 at 7:02 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
> > PSI accounts stalls for each cgroup separately and aggregates it at each
> > level of the hierarchy. This causes additional overhead with psi_avgs_work
> > being called for each cgroup in the hierarchy. psi_avgs_work has been
> > highly optimized, however on systems with large number of cgroups the
> > overhead becomes noticeable.
> > Systems which use PSI only at the system level could avoid this overhead
> > if PSI can be configured to skip per-cgroup stall accounting.
> > Add "cgroup_disable=pressure" kernel command-line option to allow
> > requesting system-wide only pressure stall accounting. When set, it
> > keeps system-wide accounting under /proc/pressure/ but skips accounting
> > for individual cgroups and does not expose PSI nodes in cgroup hierarchy.
> > Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>
> I am assuming that this is for Android and at the moment Android is
> only interested in system level pressure. I am wondering if there is
> any plan for Android to have cgroup hierarchies with explicit limits
> in future?
Correct and yes, we would like to use memcgs to limit memory in the
future, however we do not plan on using per-cgroup psi so far.
> If yes, then I think we should follow up (this patch is fine
> independently) with making this feature more general by explicitly
> enabling psi for each cgroup level similar to how we enable
> controllers through cgroup.subtree_control.
> Something like:
> $ echo "+psi" > cgroup.subtree_control
> This definitely would be helpful for server use cases where jobs do
> sub-containers but might not be interested in psi but the admin is
> interested in the top level job's psi.
Haven't thought about it before but that makes sense to me.