Re: [RFC PATCH 03/14] mm/mprotect: allow exclusive anon pages to be writable

From: Nadav Amit
Date: Wed Jul 20 2022 - 13:25:56 EST


On Jul 20, 2022, at 8:19 AM, David Hildenbrand <david@xxxxxxxxxx> wrote:

> On 18.07.22 14:02, Nadav Amit wrote:
>> From: Nadav Amit <namit@xxxxxxxxxx>
>>
>> Anonymous pages might have the dirty bit clear, but this should not
>> prevent mprotect from making them writable if they are exclusive.
>> Therefore, skip the test whether the page is dirty in this case.
>>
>> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
>> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
>> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
>> Cc: David Hildenbrand <david@xxxxxxxxxx>
>> Cc: Peter Xu <peterx@xxxxxxxxxx>
>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> Cc: Will Deacon <will@xxxxxxxxxx>
>> Cc: Yu Zhao <yuzhao@xxxxxxxxxx>
>> Cc: Nick Piggin <npiggin@xxxxxxxxx>
>> Signed-off-by: Nadav Amit <namit@xxxxxxxxxx>
>> ---
>> mm/mprotect.c | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>> index 34c2dfb68c42..da5b9bf8204f 100644
>> --- a/mm/mprotect.c
>> +++ b/mm/mprotect.c
>> @@ -45,7 +45,7 @@ static inline bool can_change_pte_writable(struct vm_area_struct *vma,
>>
>> VM_BUG_ON(!(vma->vm_flags & VM_WRITE) || pte_write(pte));
>>
>> - if (pte_protnone(pte) || !pte_dirty(pte))
>> + if (pte_protnone(pte))
>> return false;
>>
>> /* Do we need write faults for softdirty tracking? */
>> @@ -66,7 +66,8 @@ static inline bool can_change_pte_writable(struct vm_area_struct *vma,
>> page = vm_normal_page(vma, addr, pte);
>> if (!page || !PageAnon(page) || !PageAnonExclusive(page))
>> return false;
>> - }
>> + } else if (!pte_dirty(pte))
>> + return false;
>>
>> return true;
>> }
>
> When I wrote that code, I was wondering how often that would actually
> happen in practice -- and if we care about optimizing that. Do you have
> a gut feeling in which scenarios this would happen and if we care?
>
> If the page is in the swapcache and was swapped out, you'd be requiring
> a writeback even though nobody modified the page and possibly isn't
> going to do so in the near future.

So here is my due diligence: I did not really encounter a scenario in which
it showed up. When I looked at your code, I assumed this was an oversight
and not a thoughtful decision. For me the issue is more of the discrepancy
between how a certain page is handled before and after it was pages out.

The way that I see it, there is a tradeoff in the way dirty-bit should
be handled:
(1) Writable-clean PTEs introduce some non-negligible overhead.
(2) Marking a PTE dirty speculatively would require a write back.

… But this tradeoff should not affect whether a PTE is writable, i.e.,
mapping the PTE as writable-clean should not cause a writeback. In other
words, if you are concerned about unnecessary writebacks, which I think is a
fair concern, then do not set the dirty-bit. In a later patch I try to avoid
TLB flushes on clean-writable entries that are write-protected.

So I do not think that the writeback you mentioned should be a real issue.
Yet if you think that using the fact that the page is not-dirty is a good
hueristics to avoid future TLB flushes (for P->NP; as I said there is a
solution for RW->RO), or if you are concerned about the cost of
vm_normal_page(), perhaps those are valid concerned (although I do not think
so).

--

[ Regarding (1): After some discussions with Peter and reading more code, I
thought at some point that perhaps avoiding having writable-clean PTE as
much as possible makes sense [*], since setting the dirty-bit costs ~550
cycles and a page fault is not a lot more than 1000. But with all the
mitigations (and after adding IBRS for retbless) page-fault entry is kind of
expensive.

[*] At least on x86 ]