Re: Question about qspinlock nest

From: Waiman Long
Date: Fri Jan 18 2019 - 16:30:30 EST


On 01/18/2019 03:06 PM, Peter Zijlstra wrote:
> On Fri, Jan 18, 2019 at 09:50:12AM -0500, Waiman Long wrote:
>> On 01/18/2019 05:02 AM, Peter Zijlstra wrote:
>>>> e.g. We can't take an SError during the SError handler.
>>>>
>>>> But we can take this SError/NMI on another CPU while the first one is still
>>>> running the handler.
>>>>
>>>> These multiple NMIlike notifications mean having multiple locks/fixmap-slots,
>>>> one per notification. This is where the qspinlock node limit comes in, as we
>>>> could have more than 4 contexts.
>>> Right; so Waiman was going to do a patch that reverts to test-and-set or
>>> something along those lines once we hit the queue limit, which seems
>>> like a good way out. Actually hitting that nesting level should be
>>> exceedingly rare.
>> Yes, I am working on a patch to support arbitrary levels of nesting. It
>> is easy for PV qspinlock as lock stealing is supported.
>>
>> For native qspinlock, we cannot do lock stealing without incurring a
>> certain amount of overhead in the regular slowpath code. It was up to
>> 10% in my own testing. So I am exploring an alternative that can do the
>> job without incurring any noticeable performance degradation in the
>> slowpath. I ran into a race condition which I am still trying to find
>> out where that comes from. Hopefully, I will have something to post next
>> week.
> Where does the overhead come from? Surely that's not just checking that
> bound?

It is not about checking bound, it is about how to acquire the lock
without using an MCS node. The overhead comes from using atomic
instruction to acquire the lock instead of non-atomic one in order to
allow lock stealing.

Cheers,
Longman