Re: [RFC 03/20] mm/mprotect: do not flush on permission promotion

From: Andrew Cooper
Date: Sun Jan 31 2021 - 08:47:41 EST


On 31/01/2021 02:59, Andy Lutomirski wrote:
>>>> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
>>>> index 8c87a2e0b660..a617dc0a9b06 100644
>>>> --- a/arch/x86/include/asm/tlbflush.h
>>>> +++ b/arch/x86/include/asm/tlbflush.h
>>>> @@ -255,6 +255,50 @@ static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch,
>>>>
>>>> extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
>>>>
>>>> +static inline bool pte_may_need_flush(pte_t oldpte, pte_t newpte)
>>>> +{
>>>> + const pteval_t ignore_mask = _PAGE_SOFTW1 | _PAGE_SOFTW2 |
>>>> + _PAGE_SOFTW3 | _PAGE_ACCESSED;
>>> Why is accessed ignored? Surely clearing the accessed bit needs a
>>> flush if the old PTE is present.
>> I am just following the current scheme in the kernel (x86):
>>
>> int ptep_clear_flush_young(struct vm_area_struct *vma,
>> unsigned long address, pte_t *ptep)
>> {
>> /*
>> * On x86 CPUs, clearing the accessed bit without a TLB flush
>> * doesn't cause data corruption. [ It could cause incorrect
>> * page aging and the (mistaken) reclaim of hot pages, but the
>> * chance of that should be relatively low. ]
>> *
> If anyone ever implements the optimization of skipping the flush when
> unmapping a !accessed page, then this will cause data corruption.
>
> If nothing else, this deserves a nice comment in the new code.

Architecturally, access bits get as part of a pagewalk, and before the
TLB entry is made.  Once an access bit has been set, you know for
certain that a mapping for the address is, or was, present in a TLB
somewhere.

I can't help but feel that this "optimisation" is shooting itself in the
foot.  It does cause incorrect page aging, and TLBs are getting bigger,
so the chance of reclamating hot pages is getting worse and worse.

~Andrew