Re: [PATCH] x86/nmi: Use trylock in __register_nmi_handler() when in_nmi()
From: Thomas Gleixner
Date: Fri Nov 29 2024 - 11:57:55 EST
On Thu, Nov 28 2024 at 20:55, Waiman Long wrote:
> On 11/28/24 8:06 PM, Waiman Long wrote:
>>
>> On 11/28/24 4:28 AM, Peter Zijlstra wrote:
>>> On Wed, Nov 27, 2024 at 06:34:55PM -0500, Waiman Long wrote:
>>>> The __register_nmi_handler() function can be called in NMI context from
>>>> nmi_shootdown_cpus() leading to a lockdep splat like the following.
>>> This seems fundamentally insane. Why are we okay with this?
>>
>> According to the functional comment of nmi_shootdown_cpus(),
>>
>> * nmi_shootdown_cpus() can only be invoked once. After the first
>> * invocation all other CPUs are stuck in crash_nmi_callback() and
>> * cannot respond to a second NMI.
>>
>> That is why it has to insert the crash_nmi_callback() call with
>> register_nmi_handler() here in the NMI context. Changing this will
>> require a fundamental redesign of the way this shutdown process need
>> to be handled and I am not knowledgeable enough to do that. I will
>> certainly appreciate idea to handle it in a more graceful way.
>
> One idea that I current have is to add a emergency callback pointer to
> the nmi_desc structure which, if set, has priority over the handlers in
> the linked list and will be called first. In this way,
> nmi_shootdown_cpus() can set the pointer to point to
> crash_nmi_callback() without the need to take a lock and insert another
> handler at the front of the list. Please let me know if this idea is
> acceptable or not.
That's way more sane.