Re: [PATCH] arm64: mm: enable per pmd page table lock

From: Anshuman Khandual
Date: Mon Feb 18 2019 - 23:10:05 EST




On 02/19/2019 01:19 AM, Yu Zhao wrote:
> On Mon, Feb 18, 2019 at 03:12:23PM +0000, Will Deacon wrote:
>> [+Mark]
>>
>> On Thu, Feb 14, 2019 at 02:16:42PM -0700, Yu Zhao wrote:
>>> Switch from per mm_struct to per pmd page table lock by enabling
>>> ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
>>> large system.
>>>
>>> I'm not sure if there is contention on mm->page_table_lock. Given
>>> the option comes at no cost (apart from initializing more spin
>>> locks), why not enable it now.
>>>
>>> Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx>
>>> ---
>>> arch/arm64/Kconfig | 3 +++
>>> arch/arm64/include/asm/pgalloc.h | 12 +++++++++++-
>>> arch/arm64/include/asm/tlb.h | 5 ++++-
>>> 3 files changed, 18 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index a4168d366127..104325a1ffc3 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -872,6 +872,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
>>> config ARCH_HAS_CACHE_LINE_SIZE
>>> def_bool y
>>>
>>> +config ARCH_ENABLE_SPLIT_PMD_PTLOCK
>>> + def_bool y
>>> +
>>> config SECCOMP
>>> bool "Enable seccomp to safely compute untrusted bytecode"
>>> ---help---
>>> diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
>>> index 52fa47c73bf0..dabba4b2c61f 100644
>>> --- a/arch/arm64/include/asm/pgalloc.h
>>> +++ b/arch/arm64/include/asm/pgalloc.h
>>> @@ -33,12 +33,22 @@
>>>
>>> static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
>>> {
>>> - return (pmd_t *)__get_free_page(PGALLOC_GFP);
>>> + struct page *page;
>>> +
>>> + page = alloc_page(PGALLOC_GFP);
>>> + if (!page)
>>> + return NULL;
>>> + if (!pgtable_pmd_page_ctor(page)) {
>>> + __free_page(page);
>>> + return NULL;
>>> + }
>>> + return page_address(page);
>>
>> I'm a bit worried as to how this interacts with the page-table code in
>> arch/arm64/mm/mmu.c when pgd_pgtable_alloc is used as the allocator. It
>> looks like that currently always calls pgtable_page_ctor(), regardless of
>> level. Do we now need a separate allocator function for the PMD level?>
> Thanks for reminding me, I never noticed this. The short answer is
> no.
>
> I guess pgtable_page_ctor() is used on all pud/pmd/pte entries
> there because it's also compatible with pud, and pmd too without
> this patch. So your concern is valid. Thanks again.

pgtable_page_ctor() acts on a given page used as page table at any level
which sets appropriate page type (page flag PG_table) and increments the
zone stat for NR_PAGETABLE. pgtable_page_dtor() exactly does the inverse.

These two complimentary operations are required for every level page table
pages for their proper initialization, identification in buddy and zone
statistics. Hence these need to be called for all level page table pages.

pgtable_pmd_page_ctor()/pgtable_pmd_page_dtor() on the other hand just
init/free page table lock on the page for !THP cases and additionally
init page->pmd_huge_pte (deposited page table page) for THP cases.
Some archs seem to be calling pgtable_pmd_page_ctor() in place of
pgtable_page_ctor(). Wondering would not that approach skip page flag
and accounting requirements.

>
> Why my answer is no? Because I don't think the ctor matters for
> pgd_pgtable_alloc(). The ctor is only required for userspace page
> tables, and that's why we don't have it in pte_alloc_one_kernel().

At present on arm64 certain kernel page table page allocations call
pgtable_pmd_page_ctor() and some dont. The series which I had posted
make sure that all kernel and user page table page allocations go through
pgtable_page_ctor()/dtor(). These constructs are required for kernel
page table pages as well for accurate init and accounting not just for
user space. The series just skips vmemmap struct page mapping from this
as that would require generic sparse vmemmap allocation/free functions
which I believe should also be changed going forward as well.

> AFAICT, none of the pgds (efi_mm.pgd, tramp_pg_dir and init_mm.pgd)
> pre-populated by pgd_pgtable_alloc() is. (I doubt we pre-populate
> userspace page tables in any other arch).
>
> So to avoid future confusion, we might just remove the ctor from
> pgd_pgtable_alloc().

No. Instead we should just make sure the that those pages go through
dtor() destructor path when getting freed and the clean up series
does that.