Re: [syzbot] [kernel?] KASAN: slab-use-after-free Write in flush_tlb_func

From: Yury Norov
Date: Wed Jul 02 2025 - 13:44:47 EST


On Wed, Jul 02, 2025 at 07:12:31PM +0200, Thomas Gleixner wrote:
> On Wed, Jul 02 2025 at 19:00, Jann Horn wrote:
> > On Wed, Jul 2, 2025 at 6:53 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
> >> TLB flushes via IPIs on x86 are always synchronous, right?
> >> flush_tlb_func is only referenced from native_flush_tlb_multi() in
> >> calls to on_each_cpu_mask() (with wait=true) or
> >> on_each_cpu_cond_mask() (with wait=1).
> >> So I think this is not an issue, unless you're claiming that we call
> >> native_flush_tlb_multi() with an already-freed info->mm?
> >>
> >> And I think the bisected commit really is the buggy one: It looks at
> >> "nr_cpus", which tracks *how many CPUs we have to IPI*, but assumes
> >> that "nr_cpus" tracks *how many CPUs we posted work to*. Those numbers
> >> are not the same: If we post work to a CPU that already had IPI work
> >> pending, we just add a list entry without sending another IPI.
> >
> > Or in other words: After that blamed commit, if CPU 1 posts a TLB
> > flush to CPU 3, and then CPU 2 also quickly posts a TLB flush to CPU
> > 3, then CPU 2 will erroneously not wait for the TLB flush to complete
> > before reporting flush completion, which AFAICS means we can get both
> > stale TLB entries and (less often) UAF.
>
> Right you are. Well analyzed and I missed it when taking the lot.
>
> > I think the correct version of that commit would be to revert that
> > commit and instead just move the "run_remote = true;" line down, below
> > the cond_func() check.
>
> I remove it from the relevant tip branch

Thank you guys for explaining that and sorry for the buggy patch.
I was actually under impression that run_remote duplicates nr_cpus !=0,
and even have a patch that removes run_remote.

Maybe worth to add a comment on what run_remote and nr_cpus track?

Thanks,
Yury