Re: [PATCH mm v5 0/9] memcg: accounting for objects allocated by mkdir, cgroup

From: Vasily Averin
Date: Thu Jun 23 2022 - 11:03:47 EST


Dear Michal,
do you still have any concerns about this patch set?

Thank you,
Vasily Averin

On 6/23/22 17:50, Vasily Averin wrote:
> In some cases, creating a cgroup allocates a noticeable amount of memory.
> This operation can be executed from inside memory-limited container,
> but currently this memory is not accounted to memcg and can be misused.
> This allow container to exceed the assigned memory limit and avoid
> memcg OOM. Moreover, in case of global memory shortage on the host,
> the OOM-killer may not find a real memory eater and start killing
> random processes on the host.
>
> This is especially important for OpenVZ and LXC used on hosting,
> where containers are used by untrusted end users.
>
> Below is tracing results of mkdir /sys/fs/cgroup/vvs.test on
> 4cpu VM with Fedora and self-complied upstream kernel. The calculations
> are not precise, it depends on kernel config options, number of cpus,
> enabled controllers, ignores possible page allocations etc.
> However this is enough to clarify the general situation.
> All allocations are splitted into:
> - common part, always called for each cgroup type
> - per-cgroup allocations
>
> In each group we consider 2 corner cases:
> - usual allocations, important for 1-2 CPU nodes/Vms
> - percpu allocations, important for 'big irons'
>
> common part: ~11Kb + 318 bytes percpu
> memcg: ~17Kb + 4692 bytes percpu
> cpu: ~2.5Kb + 1036 bytes percpu
> cpuset: ~3Kb + 12 bytes percpu
> blkcg: ~3Kb + 12 bytes percpu
> pid: ~1.5Kb + 12 bytes percpu
> perf: ~320b + 60 bytes percpu
> -------------------------------------------
> total: ~38Kb + 6142 bytes percpu
> currently accounted: 4668 bytes percpu
>
> - it's important to account usual allocations called
> in common part, because almost all of cgroup-specific allocations
> are small. One exception here is memory cgroup, it allocates a few
> huge objects that should be accounted.
> - Percpu allocation called in common part, in memcg and cpu cgroups
> should be accounted, rest ones are small an can be ignored.
> - KERNFS objects are allocated both in common part and in most of
> cgroups
>
> Details can be found here:
> https://lore.kernel.org/all/d28233ee-bccb-7bc3-c2ec-461fd7f95e6a@xxxxxxxxxx/
>
> I checked other cgroups types was found that they all can be ignored.
> Additionally I found allocation of struct rt_rq called in cpu cgroup
> if CONFIG_RT_GROUP_SCHED was enabled, it allocates huge (~1700 bytes)
> percpu structure and should be accounted too.
>
> v5:
> 1) re-based to linux-mm (mm-everything-2022-06-22-20-36)
>
> v4:
> 1) re-based to linux-next (next-20220610)
> now psi_group is not a part of struct cgroup and is allocated on demand
> 2) added received approval from Muchun Song
> 3) improved cover letter description according to akpm@ request
>
> v3:
> 1) re-based to current upstream (v5.18-11267-gb00ed48bb0a7)
> 2) fixed few typos
> 3) added received approvals
>
> v2:
> 1) re-split to simplify possible bisect, re-ordered
> 2) added accounting for percpu psi_group_cpu and cgroup_rstat_cpu,
> allocated in common part
> 3) added accounting for percpu allocation of struct rt_rq
> (actual if CONFIG_RT_GROUP_SCHED is enabled)
> 4) improved patches descriptions
>
> Vasily Averin (9):
> memcg: enable accounting for struct cgroup
> memcg: enable accounting for kernfs nodes
> memcg: enable accounting for kernfs iattrs
> memcg: enable accounting for struct simple_xattr
> memcg: enable accounting for percpu allocation of struct psi_group_cpu
> memcg: enable accounting for percpu allocation of struct
> cgroup_rstat_cpu
> memcg: enable accounting for large allocations in mem_cgroup_css_alloc
> memcg: enable accounting for allocations in alloc_fair_sched_group
> memcg: enable accounting for perpu allocation of struct rt_rq
>
> fs/kernfs/mount.c | 6 ++++--
> fs/xattr.c | 2 +-
> kernel/cgroup/cgroup.c | 2 +-
> kernel/cgroup/rstat.c | 3 ++-
> kernel/sched/fair.c | 4 ++--
> kernel/sched/psi.c | 2 +-
> kernel/sched/rt.c | 2 +-
> mm/memcontrol.c | 4 ++--
> 8 files changed, 14 insertions(+), 11 deletions(-)
>