Re: [RFC] panic: reduce CPU consumption when finished handling panic

From: Carlos Bilbao
Date: Wed Mar 26 2025 - 11:15:13 EST


Hello,

On 3/21/25 14:03, Thomas Gleixner wrote:
> On Mon, Mar 17 2025 at 17:01, Carlos Bilbao wrote:
>> After the kernel has finished handling a panic, it enters a busy-wait loop.
>> But, this unnecessarily consumes CPU power and electricity. Plus, in VMs,
>> this negatively impacts the throughput of other VM guests running on the
>> same hypervisor.
>>
>> I propose introducing a function cpu_halt_end_panic() to halt the CPU
>> during this state while still allowing interrupts to be processed. See my
>> commit below.
> That's not the way how change logs are written. You explain the problem
> and then briefly how it is addressed.
>
> No proposal, no 'see below'. Such wording does not make any sense in a
> git commit. See Documentation/process/
>
>> @@ -276,6 +276,21 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>>          crash_smp_send_stop();
> Your patch is malformed due to white space damage:
>
> patch: **** malformed patch at line 31:          crash_smp_send_stop();
>
>> +static void cpu_halt_end_panic(void)
>> +{
>> +#ifdef CONFIG_X86
>> +    native_safe_halt();
>> +#elif defined(CONFIG_ARM)
>> +    cpu_do_idle();
>> +#else
>> +    /*
>> +     * Default to a simple busy-wait if no architecture-specific halt is
>> +     * defined above
>> +     */
>> +    mdelay(PANIC_TIMER_STEP);
>> +#endif
> Architecture specific #ifdefs in core code are not the right way to
> go. Split this into three patches:
>
> 1) Add a weak fallback function
>
> void __weak cpu_halt_after_panic(void)
> {
> mdelay(PANIC_TIMER_STEP);
> }
>
> 2) Add non weak implementation in arch/x86
>
> native_safe_halt() is wrong vs. paravirtualization.
>
> safe_halt() is what you want.
>
> 3) Add non weak implementation for arch/arm
>
> Not my playground, so no comment


Sounds good, sending patch set now. Thanks for your time!


>
> Thanks,
>
> tglx
>
>


Thanks,

Carlos