Re: [PATCH v3 03/14] mm: use pmd lock instead of racy checks in zap_pmd_range()

From: Aneesh Kumar K.V
Date: Tue Feb 07 2017 - 08:56:48 EST


"Kirill A. Shutemov" <kirill@xxxxxxxxxxxxx> writes:

> On Sun, Feb 05, 2017 at 11:12:41AM -0500, Zi Yan wrote:
>> From: Zi Yan <ziy@xxxxxxxxxx>
>>
>> Originally, zap_pmd_range() checks pmd value without taking pmd lock.
>> This can cause pmd_protnone entry not being freed.
>>
>> Because there are two steps in changing a pmd entry to a pmd_protnone
>> entry. First, the pmd entry is cleared to a pmd_none entry, then,
>> the pmd_none entry is changed into a pmd_protnone entry.
>> The racy check, even with barrier, might only see the pmd_none entry
>> in zap_pmd_range(), thus, the mapping is neither split nor zapped.
>
> That's definately a good catch.
>
> But I don't agree with the solution. Taking pmd lock on each
> zap_pmd_range() is a significant hit by scalability of the code path.
> Yes, split ptl lock helps, but it would be nice to avoid the lock in first
> place.
>
> Can we fix change_huge_pmd() instead? Is there a reason why we cannot
> setup the pmd_protnone() atomically?
>
> Mel? Rik?
>

I am also trying to fixup the usage of set_pte_at on ptes that are
valid/present (that this autonuma ptes). I guess what we are missing is a
variant of pte update routines that can atomically update a pte without
clearing it and that also doesn't do a tlb flush ?

-aneesh