RE: [PATCH v4] Add /proc/PID/smaps support for DAX

From: Du, Fan
Date: Thu Oct 26 2017 - 22:47:50 EST




>-----Original Message-----
>From: Hansen, Dave
>Sent: Thursday, October 26, 2017 10:03 PM
>To: Du, Fan <fan.du@xxxxxxxxx>; akpm@xxxxxxxxxxxxxxxxxxxx; hch@xxxxxx;
>Williams, Dan J <dan.j.williams@xxxxxxxxx>; mhocko@xxxxxxxxxx
>Cc: linux-kernel@xxxxxxxxxxxxxxx
>Subject: Re: [PATCH v4] Add /proc/PID/smaps support for DAX
>
>I'm honestly not understanding what problem this solves. Could you,
>perhaps, do a before and after of smaps with and without this patch?

The motivation here is described in the commit message.
------------------------------------------------------------------------------------------
Memory behind device DAX is not attached into normal memory
management system, when user mmap /dev/dax, smaps part is
currently missing, so no idea for user to check how much
device DAX memory are actually used in practice.
------------------------------------------------------------------------------------------


>> +/* page structure behind DAX mappings is NOT compound page
>> + * when it's a huge page mappings, so introduce new API to
>> + * account for both PMD and PUD mapping.
>> + */
>
>Why do they need to be compound? Why don't we just make them
>compound
>instead of adding all this code which is *just* for DAX?
>
>> +static void smaps_account_dax_huge(struct mem_size_stats *mss,
>> + struct page *page, unsigned long size, bool young, bool dirty)
>> +{
>> + int mapcount = page_mapcount(page);
>> +
>> + if (PageAnon(page)) {
>> + mss->anonymous += size;
>> + if (!PageSwapBacked(page) && !dirty && !PageDirty(page))
>> + mss->lazyfree += size;
>> + }
>
>How can you have DAX anonymous huge pages?
>
>> + mss->resident += size;
>> + /* Accumulate the size in pages that have been accessed. */
>> + if (young || page_is_young(page) || PageReferenced(page))
>> + mss->referenced += size;
>
>Isn't this just a copy'n'paste of smaps_account() code?
>
>> + /*
>> + * page_count(page) == 1 guarantees the page is mapped exactly once.
>> + * If any subpage of the compound page mapped with PTE it would
>elevate
>> + * page_count().
>> + */
>> + if (page_count(page) == 1) {
>> + if (dirty || PageDirty(page))
>> + mss->private_dirty += size;
>> + else
>> + mss->private_clean += size;
>> + mss->pss += (u64)size << PSS_SHIFT;
>> + return;
>> + }
>
>PSS makes *zero* sense for DAX. The "memory" is used whether the
>mapping exists or not.
>
>Also, the idea of "private" doesn't really make sense here.
>
>> + if (mapcount >= 2) {
>> + if (dirty || PageDirty(page))
>> + mss->shared_dirty += size;
>> + else
>> + mss->shared_clean += size;
>> + mss->pss += (size << PSS_SHIFT) / mapcount;
>> + } else {
>> + if (dirty || PageDirty(page))
>> + mss->private_dirty += size;
>> + else
>> + mss->private_clean += size;
>> + mss->pss += size << PSS_SHIFT;
>> + }
>> +}
>> +
>> #ifdef CONFIG_SHMEM
>> static int smaps_pte_hole(unsigned long addr, unsigned long end,
>> struct mm_walk *walk)
>> @@ -528,7 +577,16 @@ static void smaps_pte_entry(pte_t *pte, unsigned
>long addr,
>> struct page *page = NULL;
>>
>> if (pte_present(*pte)) {
>> - page = vm_normal_page(vma, addr, *pte);
>> + if (!vma_is_dax(vma))
>> + page = vm_normal_page(vma, addr, *pte);
>> + else if (pte_devmap(*pte)) {
>> + struct dev_pagemap *pgmap;
>> +
>> + pgmap = get_dev_pagemap(pte_pfn(*pte), NULL);
>> + if (!pgmap)
>> + return;
>> + page = pte_page(*pte);
>> + }
>> } else if (is_swap_pte(*pte)) {
>> swp_entry_t swpent = pte_to_swp_entry(*pte);
>>
>> @@ -579,7 +637,19 @@ static void smaps_pmd_entry(pmd_t *pmd,
>unsigned long addr,
>> struct page *page;
>>
>> /* FOLL_DUMP will return -EFAULT on huge zero page */
>> - page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>> + if (!vma_is_dax(vma))
>> + page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>> + else if (pmd_devmap(*pmd)) {
>> + struct dev_pagemap *pgmap;
>> +
>> + pgmap = get_dev_pagemap(pmd_pfn(*pmd), NULL);
>> + if (!pgmap)
>> + return;
>> + page = pmd_page(*pmd);
>> + smaps_account_dax_huge(mss, page, PMD_SIZE, pmd_young(*pmd),
>> + pmd_dirty(*pmd));
>> + return;
>> + }
>> if (IS_ERR_OR_NULL(page))
>> return;
>> if (PageAnon(page))
>>
>
>There's a fair amount of copying and pasting going on here. There is,
>again, a bunch of specialized DAX code. Isn't there a way to do this
>more generically?