RE: [PATCHv3 1/2] proc: mm: export PTE sizes directly in smaps
From: Du, Fan
Date: Wed Oct 25 2017 - 21:41:58 EST
>-----Original Message-----
>From: Michal Hocko [mailto:mhocko@xxxxxxxxxx]
>Sent: Wednesday, October 25, 2017 5:29 PM
>To: Du, Fan <fan.du@xxxxxxxxx>
>Cc: akpm@xxxxxxxxxxxxxxxxxxxx; hch@xxxxxx; Williams, Dan J
><dan.j.williams@xxxxxxxxx>; Hansen, Dave <dave.hansen@xxxxxxxxx>;
>linux-kernel@xxxxxxxxxxxxxxx; linux-api@xxxxxxxxxxxxxxx
>Subject: Re: [PATCHv3 1/2] proc: mm: export PTE sizes directly in smaps
>
>On Wed 25-10-17 08:27:34, Fan Du wrote:
>> From: Dave Hansen <dave.hansen@xxxxxxxxx>
>>
>> /proc/$pid/smaps has a number of fields that are intended to imply the
>> kinds of PTEs used to map memory. "AnonHugePages" obviously tells you
>> how many PMDs are being used. "MMUPageSize" along with the
>"Hugetlb"
>> fields tells you how many PTEs you have for a huge page.
>>
>> The current mechanisms work fine when we have one or two page sizes.
>> But, they start to get a bit muddled when we mix page sizes inside
>> one VMA. For instance, the DAX folks were proposing adding a set of
>> fields like:
>>
>> DevicePages:
>> DeviceHugePages:
>> DeviceGiganticPages:
>> DeviceGinormousPages:
>>
>> to unmuddle things when page sizes get mixed. That's fine, but
>> it does require userspace know the mapping from our various
>> arbitrary names to hardware page sizes on each architecture and
>> kernel configuration. That seems rather suboptimal.
>>
>> What folks really want is to know how much memory is mapped with
>> each page size. How about we just do *that* instead?
>>
>> Patch attached. Seems harmless enough. Seems to compile on a
>> bunch of random architectures. Makes smaps look like this:
>>
>> Private_Hugetlb: 0 kB
>> Swap: 0 kB
>> SwapPss: 0 kB
>> KernelPageSize: 4 kB
>> MMUPageSize: 4 kB
>> Locked: 0 kB
>> Ptes@4kB: 32 kB
>> Ptes@2MB: 2048 kB
>
>Yes, I agree that the current situation is quite messy. But I am
>wondering who is going to use this new information and what for?
It comes from my customer who are using Device DAX, looking for any statistics
of how much persistent memory mapping has been created, or used by application.
Current vm_normal_page implementation doesn't pick up page with DEVMAP pfn.
The second patch fix this and export DAX mappings into counters introduced in the
first patch.
IMO, the user care more about how much persistent memory they used, how about
a small tweak with smaps_account, and report the total mapping size into RSS/PSS,
which user are usually more familiar with?
>> The format I used here should be unlikely to break smaps parsers
>> unless they're looking for "kB" and now match the 'Ptes@4kB' instead
>> of the one at the end of the line.
>>
>> Note: hugetlbfs PTEs are unusual. We can have more than one "pte_t"
>> for each hugetlbfs "page". arm64 has this configuration, and probably
>> others. The code should now handle when an hstate's size is not equal
>> to one of the page table entry sizes. For instance, it assumes that
>> hstates between PMD_SIZE and PUD_SIZE are made up of multiple PMDs
>> and prints them as such.
>>
>> I've tested this on x86 with normal 4k ptes, anonymous huge pages,
>> 1G hugetlbfs and 2M hugetlbfs pages.
>>
>> 1. I'd like to thank Dan Williams for showing me a mirror as I
>> complained about the bozo that introduced 'AnonHugePages'.
>
>Does the new code add any measurable overhead? I assume it shouldn't
>from a quick look at the code. Anyway this is a useful information
>because there are people who really want it as cheap as possible.
>
>> [Fan]
>> Rebase the original patch from Dave Hansen by fixing a couple of compile
>> issues.
>>
>> Signed-off-by: Fan Du <fan.du@xxxxxxxxx>
>> Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxx>
>
>nit, the s-o-b ordering should be reverse. The original author should be
>first.
Oh, got it! seems there are community curtesy I'm not fully aware.
Apologize to Dave.
>--
>Michal Hocko
>SUSE Labs