Re: [PATCH bpf-next v3 07/17] mm: introduce BPF OOM struct ops
From: Martin KaFai Lau
Date: Mon Feb 02 2026 - 15:28:58 EST
On 1/30/26 3:29 PM, Roman Gushchin wrote:
Martin KaFai Lau <martin.lau@xxxxxxxxx> writes:
On 1/26/26 6:44 PM, Roman Gushchin wrote:
+bool bpf_handle_oom(struct oom_control *oc)
+{
+ struct bpf_struct_ops_link *st_link;
+ struct bpf_oom_ops *bpf_oom_ops;
+ struct mem_cgroup *memcg;
+ struct bpf_map *map;
+ int ret = 0;
+
+ /*
+ * System-wide OOMs are handled by the struct ops attached
+ * to the root memory cgroup
+ */
+ memcg = oc->memcg ? oc->memcg : root_mem_cgroup;
+
+ rcu_read_lock_trace();
+
+ /* Find the nearest bpf_oom_ops traversing the cgroup tree upwards */
+ for (; memcg; memcg = parent_mem_cgroup(memcg)) {
+ st_link = rcu_dereference_check(memcg->css.cgroup->bpf.bpf_oom_link,
+ rcu_read_lock_trace_held());
+ if (!st_link)
+ continue;
+
+ map = rcu_dereference_check((st_link->map),
+ rcu_read_lock_trace_held());
+ if (!map)
+ continue;
+
+ /* Call BPF OOM handler */
+ bpf_oom_ops = bpf_struct_ops_data(map);
+ ret = bpf_ops_handle_oom(bpf_oom_ops, st_link, oc);
+ if (ret && oc->bpf_memory_freed)
+ break;
+ ret = 0;
+ }
+
+ rcu_read_unlock_trace();
+
+ return ret && oc->bpf_memory_freed;
+}
+
[ ... ]
+static int bpf_oom_ops_reg(void *kdata, struct bpf_link *link)iiuc, this will allow only one oom_ops to be attached to a
+{
+ struct bpf_struct_ops_link *st_link = (struct bpf_struct_ops_link *)link;
+ struct cgroup *cgrp;
+
+ /* The link is not yet fully initialized, but cgroup should be set */
+ if (!link)
+ return -EOPNOTSUPP;
+
+ cgrp = st_link->cgroup;
+ if (!cgrp)
+ return -EINVAL;
+
+ if (cmpxchg(&cgrp->bpf.bpf_oom_link, NULL, st_link))
+ return -EEXIST;
cgroup. Considering oom_ops is the only user of the
cgrp->bpf.struct_ops_links (added in patch 2), the list should have
only one element for now.
Copy some context from the patch 2 commit log.
Hi Martin!
Sorry, I'm not quite sure what do you mean, can you please elaborate
more?
We decided (in conversations at LPC) that 1 bpf oom policy for
memcg is good for now (with a potential to extend in the future, if
there will be use cases). But it seems like there is a lot of interest
to attach struct ops'es to cgroups (there are already a couple of
patchsets posted based on my earlier v2 patches), so I tried to make the
bpf link mechanics suitable for multiple use cases from scratch.
Did I answer your question?
Got it. The link list is for the future struct_ops implementations to attach to a cgroup.
I should have mentioned the context. My bad.
BPF_PROG_TYPE_SOCK_OPS is currently a cgroup BPF prog. I am thinking of adding a bpf_struct_ops support to have similar hooks as in the BPF_PROG_TYPE_SOCK_OPS. There are some issues that need to be worked out. A major one is that the current cgroup progs have expectations on the ordering and override behavior based on the BPF_F_* and the runtime cgroup hierarchy. I was trying to see if there are pieces in this set that can be built upon. The linked list is a start but will need more work to make it performant for networking use.
This change doesn't answer the question how bpf programs belonging
to these struct ops'es will be executed. It will be done individually
for every bpf struct ops which supports this.
Please, note that unlike "normal" bpf programs, struct ops'es
are not propagated to cgroup sub-trees.
There are NONE, BPF_F_ALLOW_OVERRIDE, and BPF_F_ALLOW_MULTI, which one
may be closer to the bpf_handle_oom() semantic. If it needs to change
the ordering (or allow multi) in the future, does it need a new flag
or the existing BPF_F_xxx flags can be used.
I hope that existing flags can be used, but also I'm not sure we ever
would need multiple oom handlers per cgroup. Do you have any specific
concerns here?
Another question that I have is the default behavior when none of the BPF_F_* is specified when attaching a struct_ops to a cgroup.
From uapi/bpf.h:
* NONE (default): No further BPF programs allowed in the subtree
iiuc, the bpf_handle_oom() is not the same as NONE. Should each struct_ops implementation have its own default policy? For the BPF_PROG_TYPE_SOCK_OPS work, I am thinking the default policy should be BPF_F_ALLOW_MULTI which is always on/set now in the cgroup_bpf_link_attach().