Re: [PATCH 3/4] mm: introduce bpf_mthp_ops struct ops

From: Vernon Yang

Date: Wed May 06 2026 - 09:29:30 EST

Forwarding sashiko review

On Mon, May 4, 2026 at 1:41 AM <sashiko-bot@xxxxxxxxxx> wrote:
>
> Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
> - [Critical] Use-After-Free vulnerability in `bpf_mthp_choose()` due to improper SRCU locking.
> - [High] NULL pointer dereference of `link` before validation in `bpf_mthp_ops_reg()` and `bpf_mthp_ops_unreg()`.
> - [High] Newly created cgroups silently bypass the hierarchical BPF mTHP policy.
> - [High] Silent registration failure leaves the cgroup hierarchy in an inconsistent state.
> - [Medium] The BPF program can only restrict allowable THP orders, not expand or select new ones.
> - [Low] Compilation failure due to dereferencing an incomplete type and a non-existent struct member.
> --
>
> commit e5c9c74b11f76b4de7ba4ef8ff94c10bae978e16
> Author: Vernon Yang <yanglincheng@xxxxxxxxxx>
>
> mm: introduce bpf_mthp_ops struct ops
>
> Introducing bpf_mthp_ops enables eBPF programs to register the
> mthp_choose callback function via cgroup-ebpf.
>
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index 2949e5acff35..80ec622213df 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> [ ... ]
> > @@ -291,6 +292,11 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> > enum tva_type type,
> > unsigned long orders)
> > {
> > + /* The eBPF-specified orders overrides which order is selected. */
> > + orders &= bpf_mthp_choose(vma->vm_mm, orders);
>
> Does this inadvertently prevent the BPF program from enabling or selecting new
> orders? Since the return value is applied using a bitwise AND, it appears
> restricted to only disabling existing orders rather than customizing them as
> described in the commit message.

Yes, It is restricted to only disabling existing orders.

> > + if (!orders)
> > + return 0;
> > +
> > /*
> > * Optimization to check if required orders are enabled early. Only
> > * forced collapse ignores sysfs configs.
> > diff --git a/mm/bpf_huge_memory.c b/mm/bpf_huge_memory.c
> > new file mode 100644
> > index 000000000000..e34e0a35edac
> > --- /dev/null
> > +++ b/mm/bpf_huge_memory.c
> [ ... ]
> > +unsigned long bpf_mthp_choose(struct mm_struct *mm, unsigned long orders)
> > +{
> > + struct cgroup *cgrp;
> > + struct mem_cgroup *memcg;
> > + struct bpf_mthp_ops *ops;
> > + int idx;
> > +
> > + memcg = get_mem_cgroup_from_mm(mm);
> > + if (!memcg)
> > + return orders;
> > +
> > + cgrp = memcg->css.cgroup;
> > + ops = READ_ONCE(cgrp->mthp_ops);
> > + if (unlikely(ops)) {
> > + idx = srcu_read_lock(&mthp_bpf_srcu);
> > + if (ops->mthp_choose)
> > + orders = ops->mthp_choose(cgrp, orders);
> > + srcu_read_unlock(&mthp_bpf_srcu, idx);
> > + }
>
> Could reading cgrp->mthp_ops outside the SRCU critical section lead to a
> use-after-free regression?
>
> If a thread is preempted immediately after ops is read:
>
> ops = READ_ONCE(cgrp->mthp_ops);
> <preempted>
>
> Another thread could execute bpf_mthp_ops_unreg(), set the pointer to NULL,
> and complete synchronize_srcu(). The original thread would then resume,
> acquire the lock, and dereference the freed ops pointer.

(Report is identical to bot+bpf-ci) Good catch, I will fix that in the
next version. Thanks!

> > +
> > + mem_cgroup_put(memcg);
> > +
> > + return orders;
> > +}
> [ ... ]
> > +static int bpf_mthp_ops_reg(void *kdata, struct bpf_link *link)
> > +{
> > + struct bpf_struct_ops_link *st_link = (struct bpf_struct_ops_link *)link;
> > + struct bpf_mthp_ops *ops = kdata;
> > + struct cgroup *cgrp = st_link->cgroup;
>
> Can this cause a regression if link is passed as NULL? st_link is cast from
> link, and st_link->cgroup implicitly dereferences it before the validation
> check below.

(Report is identical to bot+bpf-ci) I will fix that in the next version. Thanks!

> Additionally, struct bpf_struct_ops_link is defined privately in
> kernel/bpf/bpf_struct_ops.c and does not contain a cgroup field. Will this
> cause a compilation failure due to dereferencing an incomplete type and a
> non-existent struct member?

(Report is identical to bot+bpf-ci)

This series is based on linux v7.1-rc1 (26fd6bff2c05) + "mm: BPF OOM"[3]
first four patches.

All of the above definitions are in "mm: BPF OOM" PATCH#2.

https://lore.kernel.org/linux-mm/20260127024421.494929-3-roman.gushchin@xxxxxxxxx/

> > + struct cgroup_subsys_state *pos;
> > +
> > + /* The link is not yet fully initialized, but cgroup should be set */
> > + if (!link)
> > + return -EOPNOTSUPP;
> > +
> > + cgroup_lock();
> > + css_for_each_descendant_pre(pos, &cgrp->self) {
> > + struct cgroup *child = pos->cgroup;
> > +
> > + if (READ_ONCE(child->mthp_ops)) {
> > + /* TODO
> > + * Do not destroy the cgroup hierarchy property.
> > + * If an eBPF program already exists in the sub-cgroup,
> > + * trigger an error and clear the already set
> > + * bpf_mthp_ops data.
> > + */
> > + continue;
>
> Is it intentional to return 0 at the end of this function when we skip over
> descendants that already have mthp_ops configured? This appears to leave the
> hierarchy in an inconsistent state while userspace assumes the registration
> was fully successful.

Yes, that is exactly what I will do next.

> Furthermore, is there a missing hook in the cgroup creation path? Since this
> loop only applies to existing descendants, any newly created cgroups will
> have their mthp_ops field initialized to NULL, allowing tasks moved there to
> silently bypass the BPF program attached to their ancestors.

Good catch, I will set WRITE_ONCE(child->mthp_ops, parent->mthp_ops)
during cgroup_create(). Thanks!

> > + }
> > + WRITE_ONCE(child->mthp_ops, ops);
> > + }
> > + cgroup_unlock();
> > +
> > + return 0;
> > +}
> > +
> > +static void bpf_mthp_ops_unreg(void *kdata, struct bpf_link *link)
> > +{
> > + struct bpf_struct_ops_link *st_link = (struct bpf_struct_ops_link *)link;
> > + struct bpf_mthp_ops *ops = kdata;
> > + struct cgroup *cgrp = st_link->cgroup;
>
> Does this code similarly dereference link before it can be validated?
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260503165024.1526680-1-vernon2gm@xxxxxxxxx?part=3