Re: [RFC 20/20] mm/rmap: avoid potential races

From: Huang, Ying
Date: Mon Aug 23 2021 - 20:36:26 EST


Nadav Amit <namit@xxxxxxxxxx> writes:

>> On Aug 23, 2021, at 1:05 AM, Huang, Ying <ying.huang@xxxxxxxxx> wrote:
>>
>> Hi, Nadav,
>>
>> Nadav Amit <nadav.amit@xxxxxxxxx> writes:
>>
>>> From: Nadav Amit <namit@xxxxxxxxxx>
>>>
>>> flush_tlb_batched_pending() appears to have a theoretical race:
>>> tlb_flush_batched is being cleared after the TLB flush, and if in
>>> between another core calls set_tlb_ubc_flush_pending() and sets the
>>> pending TLB flush indication, this indication might be lost. Holding the
>>> page-table lock when SPLIT_LOCK is set cannot eliminate this race.
>>
>> Recently, when I read the corresponding code, I find the exact same race
>> too. Do you still think the race is possible at least in theory? If
>> so, why hasn't your fix been merged?
>
> I think the race is possible. It didn’t get merged, IIRC, due to some
> addressable criticism and lack of enthusiasm from other people, and
> my laziness/busy-ness.

Got it! Thanks your information!

>>> The current batched TLB invalidation scheme therefore does not seem
>>> viable or easily repairable.
>>
>> I have some idea to fix this without too much code. If necessary, I
>> will send it out.
>
> Arguably, it would be preferable to have a small back-portable fix for
> this issue specifically. Just try to ensure that you do not introduce
> performance overheads. Any solution should be clear about its impact
> on additional TLB flushes on the worst-case scenario and the number
> of additional atomic operations that would be required.

Sure. Will do that.

Best Regards,
Huang, Ying