Re: x86/pti: smp_processor_id() called while preemptible in resume-from-sleep

From: Thomas Gleixner
Date: Sat Dec 30 2017 - 13:20:11 EST


On Sat, 30 Dec 2017, Dominik Brodowski wrote:

> On Sat, Dec 30, 2017 at 04:03:07PM +0100, Thomas Gleixner wrote:
> > On Sat, 30 Dec 2017, Dominik Brodowski wrote:
> > > resume-from-sleep (mem/S3) on v4.15-rc5-149-g5aa90a845892 triggers the
> > > following bug. If I boot with "pti=off", the kernel does not show this
> > > issue, and neither did kernels before pti was merged:
> > >
> > > [ 39.951703] ACPI: Low-level resume complete
> > > [ 39.951832] ACPI: EC: EC started
> > > [ 39.951840] PM: Restoring platform NVS memory
> > > [ 39.954648] Enabling non-boot CPUs ...
> > > [ 39.954792] x86: Booting SMP configuration:
> > > [ 39.954800] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > > [ 39.954834] BUG: using smp_processor_id() in preemptible [00000000] code: sh/465
> > > [ 39.954841] caller is native_cpu_up+0x2f0/0xa30
> >
> > I can't reproduce at the moment and I can't find a possible reason for this
> > by code inspection.
>
> Thanks for taking a look at it!
>
> > Can you please provide your .config file
>
> See attached.
>
> > and perhaps decode the two offending call sites with
> >
> > scripts/faddr2line vmlinux native_cpu_up+0x2f0/0xa30 native_cpu_up+0x447/0xa30
>
> native_cpu_up+0x2f0/0xa30:
> invalidate_user_asid at arch/x86/include/asm/tlbflush.h:343

Ah, that makes sense. Missed that in the maze.

What makes less sense is that tlbflush itself. I'm surely missing something
subtle, but from a first look that tlbflush is pointless.

> (inlined by) __native_flush_tlb at arch/x86/include/asm/tlbflush.h:351
> (inlined by) smpboot_setup_warm_reset_vector at arch/x86/kernel/smpboot.c:129
> (inlined by) do_boot_cpu at arch/x86/kernel/smpboot.c:950
> (inlined by) native_cpu_up at arch/x86/kernel/smpboot.c:1070
>
> native_cpu_up+0x447/0xa30:
> kern_pcid at arch/x86/include/asm/tlbflush.h:105
> (inlined by) invalidate_user_asid at arch/x86/include/asm/tlbflush.h:342
> (inlined by) __native_flush_tlb at arch/x86/include/asm/tlbflush.h:351
> (inlined by) smpboot_restore_warm_reset_vector at arch/x86/kernel/smpboot.c:146

This one even more so as the stale comment suggests, that there was some
page table fiddling at some point in the past.

> (inlined by) do_boot_cpu at arch/x86/kernel/smpboot.c:1022
> (inlined by) native_cpu_up at arch/x86/kernel/smpboot.c:1070

Let me think about it and do some archaeological research.

Thanks,

tglx