Re: [PATCH 2/4] bpf: add bpf_cgroup_{flush_stats,stall} function
From: bot+bpf-ci
Date: Sun May 03 2026 - 13:23:54 EST
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index baa12b24bb64..d7b04f1fc9f1 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
[ ... ]
> @@ -2819,6 +2820,32 @@ __bpf_kfunc struct cgroup *bpf_cgroup_from_id(u64 cgid)
> return cgrp;
> }
>
> +/**
> + * bpf_cgroup_stall - acquire the total stall time of cgroup
> + * @cgrp: cgroup struct
> + * @states: psi states
> + *
> + * Return the total stall time.
> + */
> +__bpf_kfunc unsigned long bpf_cgroup_stall(struct cgroup *cgrp,
> + enum psi_states states)
> +{
> + struct psi_group *group = cgroup_psi(cgrp);
> +
> + return div_u64(group->total[PSI_AVGS][states], NSEC_PER_MSEC);
> +}
Can cgroup_psi() return NULL here? Looking at include/linux/psi.h, cgroup_psi()
returns cgrp->psi for non-root cgroups, which is allocated in
psi_cgroup_alloc() only when the psi_cgroups_enabled static branch is true.
When the kernel boots with cgroup_disable=pressure or psi=0, cgrp->psi remains
NULL for all non-root cgroups. The sysfs *.pressure files are gated by
cgroup_psi_enabled() so they never hit this case, but these new kfuncs are
directly callable from BPF programs without such gating.
A BPF program could obtain a valid non-root cgroup (e.g., via
bpf_cgroup_from_id) and call bpf_cgroup_stall on it. If PSI is disabled for
cgroups, this would dereference NULL at group->total[...].
Similarly for bpf_cgroup_flush_stats below, passing NULL to
psi_group_flush_stats would eventually reach __psi_group_flush_stats which
does mutex_lock(&NULL->avgs_lock).
Should these functions check for NULL after calling cgroup_psi()?
A subsystem pattern also flags this as potentially concerning: the `states`
parameter is used directly as an array index without bounds checking. The
backing array is declared as u64 total[NR_PSI_AGGREGATORS][NR_PSI_STATES - 1]
(see include/linux/psi_types.h, struct psi_group), while enum psi_states has
the range 0..NR_PSI_STATES-1 (PSI_NONIDLE is the last member).
The BPF verifier only checks that the register is SCALAR_VALUE and does not
validate enum ranges. Because the parameter is not marked __const, a BPF
program can pass any scalar, including values derived from map lookups or
arithmetic.
Passing PSI_NONIDLE (a valid enum value) reads one element past the inner
array; passing a negative value (legal because enum is signed int in C) reads
memory before the array. This would leak adjacent memory contents to the BPF
program.
Existing BPF cgroup-stat kfuncs guard against this pattern:
bpf_mem_cgroup_page_state (mm/bpf_memcontrol.c):
if (unlikely(!memcg_stat_item_valid(idx)))
return (unsigned long)-1;
bpf_mem_cgroup_memory_events (mm/bpf_memcontrol.c):
if (unlikely(event >= MEMCG_NR_MEMORY_EVENTS))
return (unsigned long)-1;
Should this function add a similar check, such as:
if ((unsigned int)states >= NR_PSI_STATES - 1)
return -1;
> +
> +/**
> + * bpf_cgroup_flush_stats - Flush cgroup's statistics
> + * @cgrp: cgroup struct
> + */
> +__bpf_kfunc void bpf_cgroup_flush_stats(struct cgroup *cgrp)
> +{
> + struct psi_group *group = cgroup_psi(cgrp);
> +
> + psi_group_flush_stats(group);
> +}
[ ... ]
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25285279315