Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting

From: Andy Lutomirski
Date: Thu Dec 03 2020 - 21:18:44 EST



> On Dec 3, 2020, at 2:13 PM, Nicholas Piggin <npiggin@xxxxxxxxx> wrote:
>
> Excerpts from Peter Zijlstra's message of December 3, 2020 6:44 pm:
>>> On Wed, Dec 02, 2020 at 09:25:51PM -0800, Andy Lutomirski wrote:
>>>
>>> power: same as ARM, except that the loop may be rather larger since
>>> the systems are bigger. But I imagine it's still faster than Nick's
>>> approach -- a cmpxchg to a remote cacheline should still be faster than
>>> an IPI shootdown.
>>
>> While a single atomic might be cheaper than an IPI, the comparison
>> doesn't work out nicely. You do the xchg() on every unlazy, while the
>> IPI would be once per process exit.
>>
>> So over the life of the process, it might do very many unlazies, adding
>> up to a total cost far in excess of what the single IPI would've been.
>
> Yeah this is the concern, I looked at things that add cost to the
> idle switch code and it gets hard to justify the scalability improvement
> when you slow these fundmaental things down even a bit.

v2 fixes this and is generally much nicer. I’ll send it out in a couple hours.

>
> I still think working on the assumption that IPIs = scary expensive
> might not be correct. An IPI itself is, but you only issue them when
> you've left a lazy mm on another CPU which just isn't that often.
>
> Thanks,
> Nick