Re: linux 5.12 - fails to boot - soft lockup - CPU#0 stuck for 23s! - RIP smp_call_function_single

From: James Feeney
Date: Sat May 22 2021 - 19:29:03 EST


On 5/22/21 3:06 AM, Borislav Petkov wrote:
> On Fri, May 21, 2021 at 04:11:22PM -0600, James Feeney wrote:
>> Of note, *none* of these 10 boot events generated the "soft lockup"
>> Stack Trace events. Also of note, even though boot number 4 is a full
>> boot, there is a 73 second mystery delay. The delay occurs right after
>> adding swap, and just before "Bluetooth: BNEP". Boot 4 has other unique
>> characteristics, mentioned below.
>>
>> $ grep lvtth dmesglog.5.12.lvtthmr.*
>> dmesglog.5.12.lvtthmr.1:[ 1.246282] intel_init_thermal: CPU0, lvtthmr_init: 0x10200
>> dmesglog.5.12.lvtthmr.1:[ 0.933178] intel_init_thermal: CPU1, lvtthmr_init: 0x10000
>
> Aaaha, ok, your thermal interrupt is an SMI. No wonder with HP. So this
> is becoming weirder by the minute...
>
> Ok, let's look at what it does at init time. Please remove the previous
> diff, apply the below one on 5.12, make sure you have
>
> CONFIG_X86_THERMAL_VECTOR=y
>
> in the .config, boot with the aforementioned cmdline params, catch dmesg
> and send it again.
>
> Thx.
>

Hey Boris

As an aside, I ran across this, but see your name is already there:

ACPI / processor: Request native thermal interrupt handling via _OSC
https://github.com/Asus-T100/kernel/commit/0d5406b61afb06ca7c9b76c924ddc044127bb791
https://groups.google.com/g/linux.kernel/c/vln_DslTYxY

====
There are several reports of freeze on enabling HWP (Hardware PStates)
feature on Skylake-based systems by the Intel P-states driver. The root
cause is identified as the HWP interrupts causing BIOS code to freeze.
...
====

Also maybe related, from 2007 December:

AMD Thermal Interrupt Support
https://lwn.net/Articles/262469/


After swapping patches, notably, there is:

$ grep -i 'lvtt\|smi' dmesglog.5.12.apic_read.1
[ 1.245826] intel_init_thermal: CPU0, lvtthmr_init: 0x10200, read: 0x10200, misc_enable (low): 0x64952489
[ 1.249115] CPU0: Thermal monitoring handled by SMI
[ 0.933125] intel_init_thermal: CPU1, lvtthmr_init: 0x10200, read: 0x10200, misc_enable (low): 0x64952489

In the 10 boot events with the previous patch, there was always "CPU1, lvtthmr_init: 0x10000". The only functional difference is the placement of apic_read(APIC_LVTTHMR) after apic_write(APIC_LVTTHMR, lvtthmr_init).

The first boot 1 and 2 run to a normal full boot. Boot 3 ends with "soft lockup", and then "rcu_preempt detected stalls on CPUs/tasks". Boot 4 and 5 end with "soft lockup" and repeats, indefinitely.


I'm tempted to say "If it ain't broke, don't fix it", but then, where's the fun in that? Still, have you spoken with Andi Kleen? Wouldn't hurt? Perhaps there was some arcane reason to have put mcheck_intel_therm_init() with the Machine Check processes. And yet, it does not seem obvious that there is any reason for that to matter. But, what about the timing?

Out of curiosity, I added "pr_info("%s : mcheck_intel_therm_init() use to run here", __func__);" to __init mcheck_init() in arch/x86/kernel/cpu/mce/core.c, just to see *when* it *would have* run, compared to when it is running now.


Boot 1, with the pr_info, hangs at "ACPI: 6 ACPI AML tables successfully acquired and loaded". Boot 2 runs to a full boot, but with a mystery 83 second delay, again, after "adding ... swap" and before "Bluetooth...". Boot 3 is a normal full boot. Boot 4 hangs at "Trying to unpack rootfs image as initramfs...". Boot 5 is another normal full boot.

Consistently, "mcheck_intel_therm_init() use to run here" is always seen immediately following "Booting paravirtualized kernel on bare hardware", rather significantly preceding in sequence where the similar, but renamed, intel_init_thermal() is now running.

All the dmesg logs are attached, but here is a quick look:

$ grep -C 1 'mcheck\|lvtt' dmesglog.5.12.pr_info.*|sort
...
dmesglog.5.12.pr_info.1-[ 0.171697] Booting paravirtualized kernel on bare hardware
dmesglog.5.12.pr_info.1:[ 0.171699] mce: mcheck_init : mcheck_intel_therm_init() use to run here
dmesglog.5.12.pr_info.1-[ 0.171705] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370452778343963 ns
dmesglog.5.12.pr_info.1:[ 0.939884] intel_init_thermal: CPU1, lvtthmr_init: 0x10200, read: 0x10200, misc_enable (low): 0x64952489
dmesglog.5.12.pr_info.1-[ 1.246791] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
dmesglog.5.12.pr_info.1:[ 1.253473] intel_init_thermal: CPU0, lvtthmr_init: 0x10200, read: 0x10200, misc_enable (low): 0x64952489
dmesglog.5.12.pr_info.1-[ 1.256762] CPU0: Thermal monitoring handled by SMI
dmesglog.5.12.pr_info.1-[ 1.450107] .... node #0, CPUs: #1
dmesglog.5.12.pr_info.1-[ 1.468905] smp: Brought up 1 node, 2 CPUs

dmesglog.5.12.pr_info.2-[ 0.171013] Booting paravirtualized kernel on bare hardware
dmesglog.5.12.pr_info.2:[ 0.171016] mce: mcheck_init : mcheck_intel_therm_init() use to run here
dmesglog.5.12.pr_info.2-[ 0.171022] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370452778343963 ns
dmesglog.5.12.pr_info.2:[ 0.939759] intel_init_thermal: CPU1, lvtthmr_init: 0x10200, read: 0x10200, misc_enable (low): 0x64952489
dmesglog.5.12.pr_info.2-[ 1.246440] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
dmesglog.5.12.pr_info.2:[ 1.253121] intel_init_thermal: CPU0, lvtthmr_init: 0x10200, read: 0x10200, misc_enable (low): 0x64952489
dmesglog.5.12.pr_info.2-[ 1.256410] CPU0: Thermal monitoring handled by SMI
dmesglog.5.12.pr_info.2-[ 1.449755] .... node #0, CPUs: #1
dmesglog.5.12.pr_info.2-[ 1.468545] smp: Brought up 1 node, 2 CPUs

dmesglog.5.12.pr_info.3-[ 0.171884] Booting paravirtualized kernel on bare hardware
dmesglog.5.12.pr_info.3:[ 0.171887] mce: mcheck_init : mcheck_intel_therm_init() use to run here
dmesglog.5.12.pr_info.3-[ 0.171893] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370452778343963 ns
dmesglog.5.12.pr_info.3:[ 0.939468] intel_init_thermal: CPU1, lvtthmr_init: 0x10200, read: 0x10200, misc_enable (low): 0x64952489
dmesglog.5.12.pr_info.3-[ 1.247112] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
dmesglog.5.12.pr_info.3:[ 1.254192] intel_init_thermal: CPU0, lvtthmr_init: 0x10200, read: 0x10200, misc_enable (low): 0x64952489
dmesglog.5.12.pr_info.3-[ 1.257086] CPU0: Thermal monitoring handled by SMI
dmesglog.5.12.pr_info.3-[ 1.450428] .... node #0, CPUs: #1
dmesglog.5.12.pr_info.3-[ 1.469234] smp: Brought up 1 node, 2 CPUs

dmesglog.5.12.pr_info.4-[ 0.171707] Booting paravirtualized kernel on bare hardware
dmesglog.5.12.pr_info.4:[ 0.171709] mce: mcheck_init : mcheck_intel_therm_init() use to run here
dmesglog.5.12.pr_info.4-[ 0.171715] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370452778343963 ns
dmesglog.5.12.pr_info.4:[ 0.939815] intel_init_thermal: CPU1, lvtthmr_init: 0x10200, read: 0x10200, misc_enable (low): 0x64952489
dmesglog.5.12.pr_info.4-[ 1.246509] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
dmesglog.5.12.pr_info.4:[ 1.253191] intel_init_thermal: CPU0, lvtthmr_init: 0x10200, read: 0x10200, misc_enable (low): 0x64952489
dmesglog.5.12.pr_info.4-[ 1.256480] CPU0: Thermal monitoring handled by SMI
dmesglog.5.12.pr_info.4-[ 1.449825] .... node #0, CPUs: #1
dmesglog.5.12.pr_info.4-[ 1.468633] smp: Brought up 1 node, 2 CPUs

dmesglog.5.12.pr_info.5-[ 0.171201] Booting paravirtualized kernel on bare hardware
dmesglog.5.12.pr_info.5:[ 0.171203] mce: mcheck_init : mcheck_intel_therm_init() use to run here
dmesglog.5.12.pr_info.5-[ 0.171209] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370452778343963 ns
dmesglog.5.12.pr_info.5:[ 0.939935] intel_init_thermal: CPU1, lvtthmr_init: 0x10200, read: 0x10200, misc_enable (low): 0x64952489
dmesglog.5.12.pr_info.5-[ 1.247408] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
dmesglog.5.12.pr_info.5:[ 1.254090] intel_init_thermal: CPU0, lvtthmr_init: 0x10200, read: 0x10200, misc_enable (low): 0x64952489
dmesglog.5.12.pr_info.5-[ 1.257378] CPU0: Thermal monitoring handled by SMI
dmesglog.5.12.pr_info.5-[ 1.450723] .... node #0, CPUs: #1
dmesglog.5.12.pr_info.5-[ 1.469533] smp: Brought up 1 node, 2 CPUs

James

> ---
> diff --git a/drivers/thermal/intel/therm_throt.c b/drivers/thermal/intel/therm_throt.c
> index f8e882592ba5..853d4d4b4270 100644
> --- a/drivers/thermal/intel/therm_throt.c
> +++ b/drivers/thermal/intel/therm_throt.c
> @@ -625,7 +625,7 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
> {
> unsigned int cpu = smp_processor_id();
> int tm2 = 0;
> - u32 l, h;
> + u32 l, h, tmp = -1;
>
> if (!intel_thermal_supported(c))
> return;
> @@ -652,13 +652,17 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
> * BIOS has programmed on AP based on BSP's info we saved since BIOS
> * is always setting the same value for all threads/cores.
> */
> - if ((h & APIC_DM_FIXED_MASK) != APIC_DM_FIXED)
> + if ((h & APIC_DM_FIXED_MASK) != APIC_DM_FIXED) {
> apic_write(APIC_LVTTHMR, lvtthmr_init);
> + tmp = apic_read(APIC_LVTTHMR);
> + }
>
> + pr_info("%s: CPU%d, lvtthmr_init: 0x%x, read: 0x%x, misc_enable (low): 0x%x\n",
> + __func__, cpu, lvtthmr_init, tmp, l);
>
> if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) {
> if (system_state == SYSTEM_BOOTING)
> - pr_debug("CPU%d: Thermal monitoring handled by SMI\n", cpu);
> + pr_info("CPU%d: Thermal monitoring handled by SMI\n", cpu);
> return;
> }
>
>

Attachment: dmesglog.5.12.apic_read.1
Description: Unix manual page

Attachment: dmesglog.5.12.apic_read.2
Description: Unix manual page

Attachment: dmesglog.5.12.apic_read.3
Description: Unix manual page

Attachment: dmesglog.5.12.apic_read.4
Description: Unix manual page

Attachment: dmesglog.5.12.apic_read.5
Description: Unix manual page

Attachment: dmesglog.5.12.pr_info.1
Description: Unix manual page

Attachment: dmesglog.5.12.pr_info.2
Description: Unix manual page

Attachment: dmesglog.5.12.pr_info.3
Description: Unix manual page

Attachment: dmesglog.5.12.pr_info.4
Description: Unix manual page

Attachment: dmesglog.5.12.pr_info.5
Description: Unix manual page