Re: [PATCH bpf-next v3 07/17] mm: introduce BPF OOM struct ops
From: Matt Bobrowski
Date: Sun Feb 01 2026 - 23:06:50 EST
On Tue, Jan 27, 2026 at 09:12:56PM +0000, Roman Gushchin wrote:
> Michal Hocko <mhocko@xxxxxxxx> writes:
>
> > On Mon 26-01-26 18:44:10, Roman Gushchin wrote:
> >> Introduce a bpf struct ops for implementing custom OOM handling
> >> policies.
> >>
> >> It's possible to load one bpf_oom_ops for the system and one
> >> bpf_oom_ops for every memory cgroup. In case of a memcg OOM, the
> >> cgroup tree is traversed from the OOM'ing memcg up to the root and
> >> corresponding BPF OOM handlers are executed until some memory is
> >> freed. If no memory is freed, the kernel OOM killer is invoked.
> >>
> >> The struct ops provides the bpf_handle_out_of_memory() callback,
> >> which expected to return 1 if it was able to free some memory and 0
> >> otherwise. If 1 is returned, the kernel also checks the bpf_memory_freed
> >> field of the oom_control structure, which is expected to be set by
> >> kfuncs suitable for releasing memory (which will be introduced later
> >> in the patch series). If both are set, OOM is considered handled,
> >> otherwise the next OOM handler in the chain is executed: e.g. BPF OOM
> >> attached to the parent cgroup or the kernel OOM killer.
> >
> > I still find this dual reporting a bit confusing. I can see your
> > intention in having a pre-defined "releasers" of the memory to trust BPF
> > handlers more but they do have access to oc->bpf_memory_freed so they
> > can manipulate it. Therefore an additional level of protection is rather
> > weak.
>
> No, they can't. They have only a read-only access.
>
> > It is also not really clear to me how this works while there is OOM
> > victim on the way out. (i.e. tsk_is_oom_victim() -> abort case). This
> > will result in no killing therefore no bpf_memory_freed, right? Handler
> > itself should consider its work done. How exactly is this handled.
>
> It's a good question, I see your point...
> Basically we want to give a handler an option to exit with "I promise,
> some memory will be freed soon" without doing anything destructive.
> But keeping it save at the same time.
>
> I don't have a perfect answer out of my head, maybe some sort of a
> rate-limiter/counter might work? E.g. a handler can promise this N times
> before the kernel kicks in? Any ideas?
>
> > Also is there any way to handle the oom by increasing the memcg limit?
> > I do not see a callback for that.
>
> There is no kfunc yet, but it's a good idea (which we accidentally
> discussed few days ago). I'll implement it.
Yes, please, this is something that I had mentioned to you the other
day too. With this kind of BPF kfunc, we'll basically be able to
handle memcg scoped OOM events inline without necessarily being forced
to kill off anything.