Re: [PATCH 3/3] cgroup: Implement cgroup2 basic CPU usage accounting

From: Tejun Heo
Date: Tue Aug 29 2017 - 11:24:36 EST


Hello, Peter.

On Tue, Aug 29, 2017 at 04:32:52PM +0200, Peter Zijlstra wrote:
> So I mostly like. On accounting it only adds to the immediate cgroup (if
> it has a parent, aka !root).
>
> On update it does a DFS of all sub-groups and propagates the deltas up
> to the requested group.
...
> What I don't get is why you need cgroup_cpu_stat_updated(). That is, I
> see you use it to keep the keep the DFS 'stack' up-to-date, but what I
> don't see is why you'd need that.

That is to make reading stats O(number of descendants which have been
active since last read) instad of O(number of all descendants) as
there can be a lot of not-too-active cgroups in a system. Stat
reading can be frequent, so the combination can get really bad. By
keeping the updated list separate, increasing read frequency decreases
the cost of each read.

Also, please note that a system may end up with a lot of cgroups
without the user intending to. memcg drains removed cgroups lazily
and the number of draining cgroups can reach very high numbers if the
system isn't under memory pressure. The plan is to add basic stats
for other resources too and keeping it scalable w.r.t. idle cgroups
allows using the same mechanism for all resources.

> Have a look at walk_tg_tree_from(), I think we can do something like
> that on struct cgroup_subsys_state, it has that children list and the
> parent pointer.
>
> And yes, walk_tg_tree_from() is tricky, it always takes a fair while to
> remember how it works.

We can propagate "updated" flag up the tree (we need to, otherwise we
can't tell which subtree to descend into) and prune the iteration on
subtrees which haven't been updated; however, this can still become
very costly depending on the topology as it can't jump over the
siblings which haven't been updated.

Thanks.

--
tejun