Re: [syzbot] [kernel?] linux-next test error: WARNING in switch_mm_irqs_off

From: Ingo Molnar
Date: Mon Apr 14 2025 - 11:51:24 EST



* Ingo Molnar <mingo@xxxxxxxxxx> wrote:

>
> * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> > > Call Trace:
> > > <TASK>
> > > unuse_temporary_mm+0x9f/0x100 arch/x86/mm/tlb.c:1038
> > > __text_poke+0x7b6/0xb40 arch/x86/kernel/alternative.c:2214
> > > text_poke arch/x86/kernel/alternative.c:2257 [inline]
> > > smp_text_poke_batch_finish+0x3e7/0x12c0 arch/x86/kernel/alternative.c:2565
> > > arch_jump_label_transform_apply+0x1c/0x30 arch/x86/kernel/jump_label.c:146
> > > static_key_disable_cpuslocked+0xd2/0x1c0 kernel/jump_label.c:240
> > > static_key_disable+0x1a/0x20 kernel/jump_label.c:248
> > > once_deferred+0x70/0xb0 lib/once.c:20
> > > process_one_work kernel/workqueue.c:3238 [inline]
> > > process_scheduled_works+0xac3/0x18e0 kernel/workqueue.c:3319
> > > worker_thread+0x870/0xd50 kernel/workqueue.c:3400
> > > kthread+0x7b7/0x940 kernel/kthread.c:464
> > > ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:153
> > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > > </TASK>
> >
> > So I can reproduce, and I I think I see what happens, except I'm
> > confused as to why the recently merged patches show this.
> >
> > AFAIU what happens is that unuse_temporary_mm() clears the
> > mm_cpumask() for the current CPU, while switch_mm_irqs_off() then
> > checks that the mm_cpumask() bit is set for the current CPU.
> >
> > This behaviour hasn't really changed since 209954cbc7d0 ("x86/mm/tlb:
> > Update mm_cpumask lazily") introduced both.
> >
> > I'm not entirely sure what the best way forward is.. we can simply
> > delete the warning, or make use_temporary_mm() tag the special MMs
> > somehow and exclude them from the warning.
>
> So, mm_cpumask is basically tracking on which CPUs the MM ran on, and
> this gets cleared lazily whenever there's an opportune time, but not
> during context switches (for shared cacheline performance reasons),
> right?
>
> So why do we need to clear the mm_cpumask in unuse_temporary_mm() to
> begin with:
>
> /* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
> cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));
>
> What TLB flushing are we worried about here? Nothing much should
> trigger any TLB flushing for text_poke_mm AFAICS?

Ie. something like the patch below - but I might be missing something
here ...

Thanks,

Ingo

=================>
arch/x86/mm/tlb.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 0ebbaab55b0a..d36d370042e2 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1032,9 +1032,6 @@ void unuse_temporary_mm(struct mm_struct *prev_mm)
{
lockdep_assert_preemption_disabled();

- /* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
- cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));
-
switch_mm_irqs_off(NULL, prev_mm, current);

/*