Re: [PATCH v2 02/23] bpf: initial support for attaching struct ops to cgroups
From: Song Liu
Date: Thu Oct 30 2025 - 13:56:58 EST
On Thu, Oct 30, 2025 at 9:14 AM Tejun Heo <tj@xxxxxxxxxx> wrote:
>
> Hello,
>
> On Wed, Oct 29, 2025 at 09:32:44PM -0700, Song Liu wrote:
> > If the use case is to attach a single struct_ops to a single cgroup, the author
> > of that BPF program can always ignore the memcg parameter and use
> > global variables, etc. We waste a register in BPF ISA to save the pointer to
> > memcg, but JiT may recover that in native instructions.
> >
> > OTOH, starting without a memcg parameter, it will be impossible to allow
> > attaching the same struct_ops to different cgroups. I still think it is a valid
> > use case that the sysadmin loads a set of OOM handlers for users in the
> > containers to choose from is a valid use case.
>
> I find something like that being implemented through struct_ops attaching
> rather unlikely. Wouldn't it look more like the following?
>
> - Attach a handler at the parent level which implements different policies.
>
> - Child cgroups pick the desired policy using e.g. cgroup xattrs and when
> OOM event happens, the OOM handler attached at the parent implements the
> requested policy.
OK, using xattrs is another way to achieve this.
> - If further customization is desired and supported, it's implemented
> through child loading its own OOM handler which operates under the
> parent's OOM handler.
>
> > Also, a per cgroup oom handler may need to access the memcg information
> > anyway. Without a dedicated memcg argument, the user need to fetch it
> > somewhere else.
>
> An OOM handler attached to a cgroup doesn't just need to handle OOM events
> in the cgroup itself. It's responsible for the whole sub-hierarchy. ie. It
> will need accessors to reach all those memcgs anyway.
>
> Another thing to consider is that the memcg for a given cgroup can change by
> the controller being enabled and disabled. There isn't the one permanent
> memcg that a given cgroup is associated with.
In the current version, bpf_oom_ops is attached to the memcg. As long as
we feed a pointer to memcg to all struct_ops functions, these functions
can be implemented in a stateless way. I think having the option to do
this stateless implementation will help us in the long term.
Thanks,
Song