Re: [PATCH v5 10/12] x86,tlb: do targeted broadcast flushing from tlbbatch code

From: Nadav Amit
Date: Mon Jan 20 2025 - 13:57:23 EST




> On 20 Jan 2025, at 19:56, Rik van Riel <riel@xxxxxxxxxxx> wrote:
>
> How would you keep track of CPUs where the tlbsync
> has NOT happened before arch_tlbbatch_flush()?
>
> That part seems to be missing still.

You only keep track if there is a pending tlbsync on *your* CPU. No need to
track if other CPUs did not issue tlbsync during arch_tlbbatch_add_pending().
If the process that does the reclamation was migrated, a TLBSYNC is issued
during the context switch, before that thread that does the reclamation has
any chance of being scheduled.

I hope this code changes on top of your would make it clearer:

> +void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
> + struct mm_struct *mm,
> + unsigned long uaddr)
> +{
> + if (static_cpu_has(X86_FEATURE_INVLPGB) && mm_global_asid(mm)) {
> + u16 asid = mm_global_asid(mm);
> + /*
> + * Queue up an asynchronous invalidation. The corresponding
> + * TLBSYNC is done in arch_tlbbatch_flush(), and must be done
> + * on the same CPU.
> + */

#if 0 // remove
> + if (!batch->used_invlpgb) {
> + batch->used_invlpgb = true;
> + migrate_disable();
> + }
#endif

batch->used_invlpg = true;
preempt_disable();

> + invlpgb_flush_user_nr_nosync(kern_pcid(asid), uaddr, 1, false);
> + /* Do any CPUs supporting INVLPGB need PTI? */
> + if (static_cpu_has(X86_FEATURE_PTI))
> + invlpgb_flush_user_nr_nosync(user_pcid(asid), uaddr, 1, false);

this_cpu_write(cpu_tlbstate.pending_tlbsync, true);
preempt_enable();
> +
> + /*
> + * Some CPUs might still be using a local ASID for this
> + * process, and require IPIs, while others are using the
> + * global ASID.
> + *
> + * In this corner case we need to do both the broadcast
> + * TLB invalidation, and send IPIs. The IPIs will help
> + * stragglers transition to the broadcast ASID.
> + */
> + if (READ_ONCE(mm->context.asid_transition))
> + goto also_send_ipi;
> + } else {
> +also_send_ipi:
> + inc_mm_tlb_gen(mm);
> + cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm));
> + }
> + mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);
> +}
> +

Then in switch_mm_irqs_off(), b

if (this_cpu_read(cpu_tlbstate.pending_tlbsync))
tlbsync();

Note that when switch_mm_irqs_off() is called due to context switch from
context_switch(), finish_task_switch() has still not took place, so the
task cannot be scheduled on other cores.