Re: [PATCH] x86/apic: Fix circular locking dependency between console and hrtimer locks
From: Ingo Molnar
Date: Tue Apr 14 2020 - 02:25:18 EST
* Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> Hi,
>
> Any feedback?
> https://lore.kernel.org/lkml/20200407170925.1775019-1-leon@xxxxxxxxxx/
The fix definitely looks legit, lockdep is right that we shouldn't take
the console_sem.lock even under trylock.
It's only a printk_once(), yet I'm wondering why in the last ~8 years
this never triggered. Nobody ever ran lockdep and debug console level
enabled on such hardware, or did something else change?
One possibility would be that apic_check_deadline_errata() marked almost
all Intel systems as broken and the TSC-deadline hardware never actually
got activated. In that case you have triggered rarely tested code and
might see other weirdnesses. Just saying. :-)
Or a bootup with "debug" specified is much more rare in production
systems, hence the 8 years old bug.
> > It is far away from my main expertise and I'm not sure that the solution
> > is correct, but it definitely fixed our regression.
> > ---
> > arch/x86/kernel/apic/apic.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> > index d254cebdd3c3..6706b2cd9aec 100644
> > --- a/arch/x86/kernel/apic/apic.c
> > +++ b/arch/x86/kernel/apic/apic.c
> > @@ -353,7 +353,7 @@ static void __setup_APIC_LVTT(unsigned int clocks, int oneshot, int irqen)
> > */
> > asm volatile("mfence" : : : "memory");
> >
> > - printk_once(KERN_DEBUG "TSC deadline timer enabled\n");
> > + printk_deferred_once(KERN_DEBUG "TSC deadline timer enabled\n");
I think we should move this essentially initialization-time message much
earlier during bootup, when we are not holding any hrtimer locks.
One good place would be apic_check_deadline_errata(). This place:
if (boot_cpu_data.microcode >= rev)
return;
setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
pr_err(FW_BUG "TSC_DEADLINE disabled due to Errata; "
"please update microcode to version: 0x%x (or later)\n", rev);
Could be something like:
if (boot_cpu_data.microcode >= rev) {
pr_debug("x86/apic: TSC deadline timer enabled.\n");
return;
}
setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
pr_err(FW_BUG "TSC_DEADLINE disabled due to CPU errata, please update microcode to version: 0x%x (or later)\n", rev);
(Note the small fixes I did to the errata message - we should do that and
also move all user-facing messages into a single line while at it.)
Thanks,
Ingo