Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter

From: Joel Fernandes

Date: Thu Feb 05 2026 - 20:25:03 EST

On 2/5/2026 8:14 PM, Boqun Feng wrote:
> On Thu, Feb 05, 2026 at 07:50:03PM -0500, Joel Fernandes wrote:
>>
>>
>> On 2/5/2026 5:17 PM, Joel Fernandes wrote:
>>>
>>>
>>> On 2/5/2026 4:40 PM, Boqun Feng wrote:
>>>> On Wed, Feb 04, 2026 at 12:12:34PM +0100, Peter Zijlstra wrote:
>>>>> On Tue, Feb 03, 2026 at 01:15:21PM +0100, Peter Zijlstra wrote:
>>>>>> But I'm really somewhat sad that 64bit can't do better than this.
>>>>>
>>>>> Here, the below builds and boots (albeit with warnings because printf
>>>>> format crap sucks).
>>>>>
>>>>
>>>> Thanks! I will drop patch #1 and #2 and use this one (with a commit log
>>>> and some more tests), given it's based on the work of Joel, Lyude and
>>>> me, would the following tags make sense to all of you?
>>>>> Co-developed-by: Joel Fernandes <joelagnelf@xxxxxxxxxx>
>>>
>>> I don't know, I am not a big fan of the alternative patch because it adds a
>>> per-cpu counter anyway if !CONFIG_PREEMPT_LONG [1]. And it is also a much bigger
>>> patch than the one I wrote. Purely from an objective perspective, I would still
>>> want to keep my original patch because it is simple. What is really the
>>> objection to it?
>>>
>
> PREEMPT_LONG is an architecture-specific way to improve the performance
> IMO. Just to be clear, do you object it at all, or do you object
> combining it with your original patch? If it's the latter, I could make
> another patch as a follow to enable PREEMPT_LONG.

When I looked at the alternative patch, I did consider that it was
overcomplicated and it should be justified. Otherwise, I don't object to it. It
seems to be a matter of preference I think. I would prefer a simpler fix than an
overcomplicated fix for a hypothetical issue (unless we have data showing
issue). If it was a few lines of change, that'd be different story.

>
>>> [1]
>>> +#ifndef CONFIG_PREEMPT_LONG
>>> +/*
>>> + * Any 32bit architecture that still cares about performance should
>>> + * probably ensure this is near preempt_count.
>>> + */
>>> +DEFINE_PER_CPU(unsigned int, nmi_nesting);
>>> +#endif
>>>
>> If the objection to my patch is modifying a per-cpu counter, isn't NMI a slow
>> path? If we agree, then keeping things simple is better IMO unless we have data
>
> I guess Peter was trying to say it's not a slow path if you consider
> perf event interrupts on x86? [1]

How are we handling this performance issue then on 32-bit x86 architecture with
perf? Or are we saying we don't care about performance on 32-bit?

--
Joel Fernandes