Re: [PATCH 3/3] mm/mmu_gather: send tlb_remove_table_smp_sync IPI only to CPUs in kernel mode

From: Valentin Schneider
Date: Wed Apr 05 2023 - 08:46:00 EST


On 05/04/23 14:05, Frederic Weisbecker wrote:
> static void smp_call_function_many_cond(const struct cpumask *mask,
> smp_call_func_t func, void *info,
> @@ -946,10 +948,13 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
> #endif
> cfd_seq_store(pcpu->seq_queue, this_cpu, cpu, CFD_SEQ_QUEUE);
> if (llist_add(&csd->node.llist, &per_cpu(call_single_queue, cpu))) {
> - __cpumask_set_cpu(cpu, cfd->cpumask_ipi);
> - nr_cpus++;
> - last_cpu = cpu;
> -
> + if (!(scf_flags & SCF_NO_USER) ||
> + !IS_ENABLED(CONFIG_GENERIC_ENTRY) ||
> + ct_state_cpu(cpu) != CONTEXT_USER) {
> + __cpumask_set_cpu(cpu, cfd->cpumask_ipi);
> + nr_cpus++;
> + last_cpu = cpu;
> + }

I've been hacking on something like this (CSD deferral for NOHZ-full),
and unfortunately this uses the CPU-local cfd_data storage thing, which
means any further smp_call_function() from the same CPU to the same
destination will spin on csd_lock_wait(), waiting for the target CPU to
come out of userspace and flush the queue - and we've just spent extra
effort into *not* disturbing it, so that'll take a while :(

I don't have much that is in a shareable state yet (though I'm supposed to
talk some more about it at OSPM in <2 weeks, so I'll have to get there),
but ATM I'm playing with
o a bitmask (like in [1]) for coalescable stuff such as do_sync_core() for
x86 instruction patching
o a CSD-like queue for things that need to pass data around, using
statically-allocated storage (so with a limit on how much it can be used) - the
alternative being allocating a struct on sending, since you don't have a
bound on how much crap you can queue on an undisturbed NOHZ-full CPU...

[1]: https://lore.kernel.org/all/20210929152429.067060646@xxxxxxxxxxxxx/