Re: [PATCH v3 3/3] mm/memory.c: Add memory read privilege before filling PTE entry
From: maobibo
Date: Mon May 18 2020 - 23:23:38 EST
On 05/19/2020 04:57 AM, Andrew Morton wrote:
> On Mon, 18 May 2020 13:08:49 +0800 Bibo Mao <maobibo@xxxxxxxxxxx> wrote:
>
>> On mips platform, hw PTE entry valid bit is set in pte_mkyoung
>> function, it is used to set physical page with readable privilege.
>
> pte_mkyoung() seems to be a strange place to set the pte's valid bit.
> Why is it done there? Can it be done within mips's mk_pte()?
On MIPS system hardware cannot set PAGE_ACCESS bit when accessing the page,
software sets PAGE_ACCESS software bit and PAGE_VALID hw bit together during page
fault stage.
If mk_pte is called in page fault flow, it is ok to set both bits. If it is not
called in page fault, PAGE_ACCESS is set however there is no actual memory accessing.
>
>> Here add pte_mkyoung function to make page readable on MIPS platform
>> during page fault handling. This patch improves page fault latency
>> about 10% on my MIPS machine with lmbench lat_pagefault case.
>>
>> ...
>>
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -2704,6 +2704,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>> }
>> flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte));
>> entry = mk_pte(new_page, vma->vm_page_prot);
>> + entry = pte_mkyoung(entry);
>> entry = maybe_mkwrite(pte_mkdirty(entry), vma);
>
> What is the effect on non-mips machines? If it's only "additional
> overhead" then it would be better to add a new pte_mkvalid() (or
> whatever) and arrange for it to be a no-op on all but mips?
how about adding pte_sw_mkyoung alike function which is a noop on all but mips?
this function is used to set PAGE_ACCESS bit and PAGE_VALID bit on mips platform.
regards
bibo,mao
>
>> /*
>> * Clear the pte entry and flush it first, before updating the
>> @@ -3378,6 +3379,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>> __SetPageUptodate(page);
>>
>> entry = mk_pte(page, vma->vm_page_prot);
>> + entry = pte_mkyoung(entry);
>> if (vma->vm_flags & VM_WRITE)
>> entry = pte_mkwrite(pte_mkdirty(entry));
>>
>> @@ -3660,6 +3662,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
>>
>> flush_icache_page(vma, page);
>> entry = mk_pte(page, vma->vm_page_prot);
>> + entry = pte_mkyoung(entry);
>> if (write)
>> entry = maybe_mkwrite(pte_mkdirty(entry), vma);
>> /* copy-on-write page */
>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>> index 494192ca..673f1cd 100644
>> --- a/mm/mprotect.c
>> +++ b/mm/mprotect.c
>> @@ -131,6 +131,8 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
>> ptent = pte_clear_uffd_wp(ptent);
>> }
>>
>> + if (vma->vm_flags & VM_READ)
>> + ptent = pte_mkyoung(ptent);
>> /* Avoid taking write faults for known dirty pages */
>> if (dirty_accountable && pte_dirty(ptent) &&
>> (pte_soft_dirty(ptent) ||