Re: [PATCH V3] riscv: asid: Fixup stale TLB entry cause application crash

From: Guo Ren
Date: Thu Dec 08 2022 - 22:15:31 EST


On Fri, Dec 9, 2022 at 7:30 AM Palmer Dabbelt <palmer@xxxxxxxxxxxx> wrote:
>
> On Fri, 18 Nov 2022 12:57:21 PST (-0800), geomatsi@xxxxxxxxx wrote:
> > Hi Guo Ren,
> >
> >
> >> After use_asid_allocator is enabled, the userspace application will
> >> crash by stale TLB entries. Because only using cpumask_clear_cpu without
> >> local_flush_tlb_all couldn't guarantee CPU's TLB entries were fresh.
> >> Then set_mm_asid would cause the user space application to get a stale
> >> value by stale TLB entry, but set_mm_noasid is okay.
> >
> > ... [snip]
> >
> >> + /*
> >> + * The mm_cpumask indicates which harts' TLBs contain the virtual
> >> + * address mapping of the mm. Compared to noasid, using asid
> >> + * can't guarantee that stale TLB entries are invalidated because
> >> + * the asid mechanism wouldn't flush TLB for every switch_mm for
> >> + * performance. So when using asid, keep all CPUs footmarks in
> >> + * cpumask() until mm reset.
> >> + */
> >> + cpumask_set_cpu(cpu, mm_cpumask(next));
> >> + if (static_branch_unlikely(&use_asid_allocator)) {
> >> + set_mm_asid(next, cpu);
> >> + } else {
> >> + cpumask_clear_cpu(cpu, mm_cpumask(prev));
> >> + set_mm_noasid(next);
> >> + }
> >> }
> >
> > I observe similar user-space crashes on my SMP systems with enabled ASID.
> > My attempt to fix the issue was a bit different, see the following patch:
> >
> > https://lore.kernel.org/linux-riscv/20220829205219.283543-1-geomatsi@xxxxxxxxx/
> >
> > In brief, the idea was borrowed from flush_icache_mm handling:
> > - keep track of CPUs not running the task
> > - perform per-ASID TLB flush on such CPUs only if the task is switched there
>
> That way looks better to me: leaking hartids in the ASID allocator might
> make the crashes go away, but it's just going to end up trending towards
> flushing everything and that doesn't seem like the right long-term
> solution.
The penalty in switch_mm is too heavy!!!
- If the system has multiple NUMA nodes, it will cause switch_mm_fast
flush unnecessary harts.
- If flush_range is just 1 entry, it would case flush_tlb_all_asid.

switch_mm_fast:
csr_write(CSR_SATP, virt_to_pfn(mm->pgd) |
((cntx & asid_mask) << SATP_ASID_SHIFT) |
satp_mode);

if (need_flush_tlb)
local_flush_tlb_all();
+#ifdef CONFIG_SMP
+ else {
+ cpumask_t *mask = &mm->context.tlb_stale_mask;+
+
+ if (cpumask_test_cpu(cpu, mask)) {
+ cpumask_clear_cpu(cpu, mask);
+ local_flush_tlb_all_asid(cntx & asid_mask);
// penalty in switch_mm fast path
+ }
+ }
+#endif

And See:
static void __sbi_tlb_flush_range(struct mm_struct *mm, unsigned long start,
unsigned long size, unsigned long stride)
{
+ struct cpumask *pmask = &mm->context.tlb_stale_mask;
struct cpumask *cmask = mm_cpumask(mm);
unsigned int cpuid;
bool broadcast;
@@ -44,6 +29,15 @@ static void __sbi_tlb_flush_range(struct mm_struct
*mm, unsigned long start,
if (static_branch_unlikely(&use_asid_allocator)) {
unsigned long asid = atomic_long_read(&mm->context.id);

+ /*
+ * TLB will be immediately flushed on harts concurrently
+ * executing this MM context. TLB flush on other harts
+ * is deferred until this MM context migrates there.
+ */
+ cpumask_setall(pmask);
^^^^^^^^^^^^^^^^^^^^^^^ It would flush all harts for
all NUMA nodes!!! Most of them deferred to switch_mm_fast. The penalty
contains unnecessary harts!
+ cpumask_clear_cpu(cpuid, pmask);
+ cpumask_andnot(pmask, pmask, cmask);

Please reconsider a bit, and make a smart decision. Just penalty the
harts who touched the mm, not all. And only flush the whole TLB when
some entries are needed.

The __sbi_tlb_flush_range is the slow path; keeping the fast path
performance is worth more than improving a slow one.

>
> So I've got that one on for-next, sorry I missed it before.
>
> Thanks!
>
> >
> > Your patch also works fine in my tests fixing those crashes. I have a
> > question though, regarding removed cpumask_clear_cpu. How CPUs no more
> > running the task are removed from its mm_cpumask ? If they are not
> > removed, then flush_tlb_mm/flush_tlb_page will broadcast unnecessary
> > TLB flushes to those CPUs when ASID is enabled.
> >
> > Regards,
> > Sergey



--
Best Regards
Guo Ren