RE: [PATCH 2/2] Add /proc/PID/{smaps, numa_maps} support for DAX

From: Du, Fan
Date: Fri Oct 27 2017 - 00:03:35 EST




>-----Original Message-----
>From: linux-kernel-owner@xxxxxxxxxxxxxxx
>[mailto:linux-kernel-owner@xxxxxxxxxxxxxxx] On Behalf Of Dave Hansen
>Sent: Thursday, October 26, 2017 10:51 PM
>To: Michal Hocko <mhocko@xxxxxxxxxx>
>Cc: Du, Fan <fan.du@xxxxxxxxx>; akpm@xxxxxxxxxxxxxxxxxxxx; hch@xxxxxx;
>Williams, Dan J <dan.j.williams@xxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx;
>linux-api@xxxxxxxxxxxxxxx
>Subject: Re: [PATCH 2/2] Add /proc/PID/{smaps, numa_maps} support for DAX
>
>On 10/26/2017 07:31 AM, Michal Hocko wrote:
>> On Thu 26-10-17 07:24:14, Dave Hansen wrote:
>>> Actually, I don't remember whether it was tooling or just confused
>>> humans. I *think* Dan was trying to write test cases for huge page DAX
>>> support and couldn't figure out whether or not it was using large pages.
>>
>> That sounds like a very weak justification to adding new stuff to smaps
>> to be honest.
>
>Yep, agreed. It can't go in _just_ for DAX, and Fan and the other DAX
>folks need to elaborate on their needs here.

If user creates device DAX /dev/dax with some capacity like 512G, mmap it and
Use it will, or touched 128G. To my best knowledge at this part, there is no
statistics reported how much memory behind DAX actually used.

This is the problem our customer is facing right now.

I agree with smaps should make no special for DAX, whether or not we update
smaps for DAX, cat /proc/PID/smaps always report its part for /dev/dax.
The problem is should we report DAX usage by RSS, or introduce new fields
Like Pte@{4K,2M} for a another different purpose.

The v4 version tried to merged device DAX usage with RSS,
https://lkml.org/lkml/2017/10/26/24



>Do you have any better ideas? If we did this, we could probably make an
>argument that the AnonHuge fields could go away some day. They haven't
>always been there.
>
>The only other alternative I can think of are truly DAX-specific
>interfaces, which also seem like a really bad idea.
>
>> Not only that. There have been reports that reading smaps is too
>> expensive. Curiously enough the overhead doesn't come up from
>> the data collection but rather copying to the userspace. So we should be
>> careful to not print data that is not of general use.
>
>Yikes! I just did a quick:
>
> while true; do cat /proc/*/smaps | wc ; done
>
>and the copying out to userspace is ~1/15th the overhead of
>smaps_account(). Something sounds screwy if you're seeing the overhead
>at copying to userspace.
>
>What else can we do than continue to bloat smaps? Could we do a file
>per VMA?
>
> /proc/$pid/smap/0x123000-0x456000