Re: [PATCH] sched/cpuacct: fix use-after-free in cpuacct_account_field()
From: Peter Zijlstra
Date: Tue Apr 07 2026 - 03:59:20 EST
On Sat, Apr 04, 2026 at 10:47:42PM -0400, Rik van Riel wrote:
> cpuacct_css_free() calls free_percpu() on ca->cpustat and ca->cpuusage,
> then kfree(ca). However, a timer interrupt on another CPU can
> concurrently access this data through cpuacct_account_field(), which
> walks the cpuacct hierarchy via task_ca()/parent_ca() and performs
> __this_cpu_add(ca->cpustat->cpustat[index], val).
>
> The race window exists because put_css_set_locked() drops the CSS
> reference (css_put) before the css_set is RCU-freed (kfree_rcu). This
> means the CSS percpu_ref can reach zero and trigger the css_free chain
> while readers obtained the CSS pointer from the old css_set that is
> still visible via RCU.
>
> Although css_free_rwork_fn is already called after one RCU grace period,
> the css_set -> CSS reference drop in put_css_set_locked() creates a
> window where the CSS free chain races with readers still holding the
> old css_set reference.
To me this reads like a cgroup fail, not a cpuacct fail per se. But I'm
forever confused there. TJ?
> With KASAN enabled, free_percpu() unmaps shadow pages, so the
> KASAN-instrumented __this_cpu_add hits an unmapped shadow page
> (PMD=0), causing a page fault in IRQ context that cascades into an
> IRQ stack overflow.
>
> Fix this by deferring the actual freeing of percpu data and the cpuacct
> struct to an RCU callback via call_rcu(), ensuring that all concurrent
> readers in RCU read-side critical sections (including timer tick
> handlers) have completed before the memory is freed.
>
> Found in an AI driven syzkaller run. The bug did not repeat in the
> 14 hours since this patch was applied.
>
> Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx>
> Assisted-by: Claude:claude-opus-4.6 syzkaller
> Fixes: 3eba0505d03a ("sched/cpuacct: Remove redundant RCU read lock")
> Cc: stable@xxxxxxxxxx
> ---
> kernel/sched/cpuacct.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
> index ca9d52cb1ebb..b6e7b34de616 100644
> --- a/kernel/sched/cpuacct.c
> +++ b/kernel/sched/cpuacct.c
> @@ -28,6 +28,7 @@ struct cpuacct {
> /* cpuusage holds pointer to a u64-type object on every CPU */
> u64 __percpu *cpuusage;
> struct kernel_cpustat __percpu *cpustat;
> + struct rcu_head rcu;
> };
>
> static inline struct cpuacct *css_ca(struct cgroup_subsys_state *css)
> @@ -84,15 +85,22 @@ cpuacct_css_alloc(struct cgroup_subsys_state *parent_css)
> }
>
> /* Destroy an existing CPU accounting group */
> -static void cpuacct_css_free(struct cgroup_subsys_state *css)
> +static void cpuacct_free_rcu(struct rcu_head *rcu)
> {
> - struct cpuacct *ca = css_ca(css);
> + struct cpuacct *ca = container_of(rcu, struct cpuacct, rcu);
>
> free_percpu(ca->cpustat);
> free_percpu(ca->cpuusage);
> kfree(ca);
> }
>
> +static void cpuacct_css_free(struct cgroup_subsys_state *css)
> +{
> + struct cpuacct *ca = css_ca(css);
> +
> + call_rcu(&ca->rcu, cpuacct_free_rcu);
> +}
> +
> static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu,
> enum cpuacct_stat_index index)
> {
> --
> 2.52.0
>
>