Re: [PATCH V3] riscv: asid: Fixup stale TLB entry cause application crash

From: Palmer Dabbelt
Date: Thu Dec 08 2022 - 18:30:19 EST


On Fri, 18 Nov 2022 12:57:21 PST (-0800), geomatsi@xxxxxxxxx wrote:
Hi Guo Ren,


After use_asid_allocator is enabled, the userspace application will
crash by stale TLB entries. Because only using cpumask_clear_cpu without
local_flush_tlb_all couldn't guarantee CPU's TLB entries were fresh.
Then set_mm_asid would cause the user space application to get a stale
value by stale TLB entry, but set_mm_noasid is okay.

... [snip]

+ /*
+ * The mm_cpumask indicates which harts' TLBs contain the virtual
+ * address mapping of the mm. Compared to noasid, using asid
+ * can't guarantee that stale TLB entries are invalidated because
+ * the asid mechanism wouldn't flush TLB for every switch_mm for
+ * performance. So when using asid, keep all CPUs footmarks in
+ * cpumask() until mm reset.
+ */
+ cpumask_set_cpu(cpu, mm_cpumask(next));
+ if (static_branch_unlikely(&use_asid_allocator)) {
+ set_mm_asid(next, cpu);
+ } else {
+ cpumask_clear_cpu(cpu, mm_cpumask(prev));
+ set_mm_noasid(next);
+ }
}

I observe similar user-space crashes on my SMP systems with enabled ASID.
My attempt to fix the issue was a bit different, see the following patch:

https://lore.kernel.org/linux-riscv/20220829205219.283543-1-geomatsi@xxxxxxxxx/

In brief, the idea was borrowed from flush_icache_mm handling:
- keep track of CPUs not running the task
- perform per-ASID TLB flush on such CPUs only if the task is switched there

That way looks better to me: leaking hartids in the ASID allocator might make the crashes go away, but it's just going to end up trending towards flushing everything and that doesn't seem like the right long-term solution.

So I've got that one on for-next, sorry I missed it before.

Thanks!


Your patch also works fine in my tests fixing those crashes. I have a
question though, regarding removed cpumask_clear_cpu. How CPUs no more
running the task are removed from its mm_cpumask ? If they are not
removed, then flush_tlb_mm/flush_tlb_page will broadcast unnecessary
TLB flushes to those CPUs when ASID is enabled.

Regards,
Sergey