Re: [PATCH V3] riscv: asid: Fixup stale TLB entry cause application crash

From: Guo Ren
Date: Thu Dec 08 2022 - 22:16:30 EST


On Fri, Dec 9, 2022 at 11:13 AM Guo Ren <guoren@xxxxxxxxxx> wrote:
>
> On Fri, Dec 9, 2022 at 7:30 AM Palmer Dabbelt <palmer@xxxxxxxxxxxx> wrote:
> >
> > On Fri, 18 Nov 2022 12:57:21 PST (-0800), geomatsi@xxxxxxxxx wrote:
> > > Hi Guo Ren,
> > >
> > >
> > >> After use_asid_allocator is enabled, the userspace application will
> > >> crash by stale TLB entries. Because only using cpumask_clear_cpu without
> > >> local_flush_tlb_all couldn't guarantee CPU's TLB entries were fresh.
> > >> Then set_mm_asid would cause the user space application to get a stale
> > >> value by stale TLB entry, but set_mm_noasid is okay.
> > >
> > > ... [snip]
> > >
> > >> + /*
> > >> + * The mm_cpumask indicates which harts' TLBs contain the virtual
> > >> + * address mapping of the mm. Compared to noasid, using asid
> > >> + * can't guarantee that stale TLB entries are invalidated because
> > >> + * the asid mechanism wouldn't flush TLB for every switch_mm for
> > >> + * performance. So when using asid, keep all CPUs footmarks in
> > >> + * cpumask() until mm reset.
> > >> + */
> > >> + cpumask_set_cpu(cpu, mm_cpumask(next));
> > >> + if (static_branch_unlikely(&use_asid_allocator)) {
> > >> + set_mm_asid(next, cpu);
> > >> + } else {
> > >> + cpumask_clear_cpu(cpu, mm_cpumask(prev));
> > >> + set_mm_noasid(next);
> > >> + }
> > >> }
> > >
> > > I observe similar user-space crashes on my SMP systems with enabled ASID.
> > > My attempt to fix the issue was a bit different, see the following patch:
> > >
> > > https://lore.kernel.org/linux-riscv/20220829205219.283543-1-geomatsi@xxxxxxxxx/
> > >
> > > In brief, the idea was borrowed from flush_icache_mm handling:
> > > - keep track of CPUs not running the task
> > > - perform per-ASID TLB flush on such CPUs only if the task is switched there
> >
> > That way looks better to me: leaking hartids in the ASID allocator might
> > make the crashes go away, but it's just going to end up trending towards
> > flushing everything and that doesn't seem like the right long-term
> > solution.
> The penalty in switch_mm is too heavy!!!
> - If the system has multiple NUMA nodes, it will cause switch_mm_fast
> flush unnecessary harts.
> - If flush_range is just 1 entry, it would case flush_tlb_all_asid.
>
> switch_mm_fast:
> csr_write(CSR_SATP, virt_to_pfn(mm->pgd) |
> ((cntx & asid_mask) << SATP_ASID_SHIFT) |
> satp_mode);
>
> if (need_flush_tlb)
> local_flush_tlb_all();
> +#ifdef CONFIG_SMP
> + else {
> + cpumask_t *mask = &mm->context.tlb_stale_mask;+
> +
> + if (cpumask_test_cpu(cpu, mask)) {
> + cpumask_clear_cpu(cpu, mask);
> + local_flush_tlb_all_asid(cntx & asid_mask);
> // penalty in switch_mm fast path
> + }
> + }
> +#endif
>
> And See:
> static void __sbi_tlb_flush_range(struct mm_struct *mm, unsigned long start,
> unsigned long size, unsigned long stride)
> {
> + struct cpumask *pmask = &mm->context.tlb_stale_mask;
> struct cpumask *cmask = mm_cpumask(mm);
> unsigned int cpuid;
> bool broadcast;
> @@ -44,6 +29,15 @@ static void __sbi_tlb_flush_range(struct mm_struct
> *mm, unsigned long start,
> if (static_branch_unlikely(&use_asid_allocator)) {
> unsigned long asid = atomic_long_read(&mm->context.id);
>
> + /*
> + * TLB will be immediately flushed on harts concurrently
> + * executing this MM context. TLB flush on other harts
> + * is deferred until this MM context migrates there.
> + */
> + cpumask_setall(pmask);
> ^^^^^^^^^^^^^^^^^^^^^^^ It would flush all harts for
> all NUMA nodes!!! Most of them deferred to switch_mm_fast. The penalty
> contains unnecessary harts!
> + cpumask_clear_cpu(cpuid, pmask);
> + cpumask_andnot(pmask, pmask, cmask);
>
> Please reconsider a bit, and make a smart decision. Just penalty the
> harts who touched the mm, not all. And only flush the whole TLB when
^^^^ ^^^^ Don't
> some entries are needed.
>
> The __sbi_tlb_flush_range is the slow path; keeping the fast path
> performance is worth more than improving a slow one.
>
> >
> > So I've got that one on for-next, sorry I missed it before.
> >
> > Thanks!
> >
> > >
> > > Your patch also works fine in my tests fixing those crashes. I have a
> > > question though, regarding removed cpumask_clear_cpu. How CPUs no more
> > > running the task are removed from its mm_cpumask ? If they are not
> > > removed, then flush_tlb_mm/flush_tlb_page will broadcast unnecessary
> > > TLB flushes to those CPUs when ASID is enabled.
> > >
> > > Regards,
> > > Sergey
>
>
>
> --
> Best Regards
> Guo Ren



--
Best Regards
Guo Ren