Re: [PATCH] lockdep: add lockdep_cleanup_dead_cpu()

From: Thomas Gleixner
Date: Sat Oct 28 2023 - 10:13:29 EST


On Sat, Oct 28 2023 at 12:14, David Woodhouse wrote:

> From: David Woodhouse <dwmw@xxxxxxxxxxxx>
>
> Add a function to check that an offlone CPU left the tracing infrastructure
> in a sane state. The acpi_idle_play_dead() function was recently observed
> calling safe_halt() instead of raw_safe_halt(), which had the side-effect
> of setting the hardirqs_enabled flag for the offline CPU. On x86 this
> triggered lockdep warnings when the CPU came back online, but too early
> for the exception to be handled correctly, leading to a triple-fault.
>
> Add lockdep_cleanup_dead_cpu() to check for this kind of failure mode,
> print the events leading up to it, and correct it so that the CPU can
> come online again correctly.
>
> [ 61.556652] smpboot: CPU 1 is now offline
> [ 61.556769] CPU 1 left hardirqs enabled!
> [ 61.556915] irq event stamp: 128149
> [ 61.556965] hardirqs last enabled at (128149): [<ffffffff81720a36>] acpi_idle_play_dead+0x46/0x70
> [ 61.557055] hardirqs last disabled at (128148): [<ffffffff81124d50>] do_idle+0x90/0xe0
> [ 61.557117] softirqs last enabled at (128078): [<ffffffff81cec74c>] __do_softirq+0x31c/0x423
> [ 61.557199] softirqs last disabled at (128065): [<ffffffff810baae1>] __irq_exit_rcu+0x91/0x100
>
> Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx>

Reviewed-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>