RE: [PATCH v4] Add /proc/PID/smaps support for DAX

From: Du, Fan
Date: Fri Oct 27 2017 - 05:03:48 EST




>-----Original Message-----
>From: Michal Hocko [mailto:mhocko@xxxxxxxxxx]
>Sent: Friday, October 27, 2017 4:43 PM
>To: Du, Fan <fan.du@xxxxxxxxx>
>Cc: Hansen, Dave <dave.hansen@xxxxxxxxx>; akpm@xxxxxxxxxxxxxxxxxxxx;
>hch@xxxxxx; Williams, Dan J <dan.j.williams@xxxxxxxxx>;
>linux-kernel@xxxxxxxxxxxxxxx
>Subject: Re: [PATCH v4] Add /proc/PID/smaps support for DAX
>
>On Fri 27-10-17 08:24:07, Du, Fan wrote:
>>
>>
>> >-----Original Message-----
>> >From: Michal Hocko [mailto:mhocko@xxxxxxxxxx]
>> >Sent: Friday, October 27, 2017 4:08 PM
>> >To: Du, Fan <fan.du@xxxxxxxxx>
>> >Cc: Hansen, Dave <dave.hansen@xxxxxxxxx>; akpm@xxxxxxxxxxxxxxxxxxxx;
>> >hch@xxxxxx; Williams, Dan J <dan.j.williams@xxxxxxxxx>;
>> >linux-kernel@xxxxxxxxxxxxxxx
>> >Subject: Re: [PATCH v4] Add /proc/PID/smaps support for DAX
>> >
>> >On Fri 27-10-17 02:47:43, Du, Fan wrote:
>> >>
>> >>
>> >> >-----Original Message-----
>> >> >From: Hansen, Dave
>> >> >Sent: Thursday, October 26, 2017 10:03 PM
>> >> >To: Du, Fan <fan.du@xxxxxxxxx>; akpm@xxxxxxxxxxxxxxxxxxxx;
>hch@xxxxxx;
>> >> >Williams, Dan J <dan.j.williams@xxxxxxxxx>; mhocko@xxxxxxxxxx
>> >> >Cc: linux-kernel@xxxxxxxxxxxxxxx
>> >> >Subject: Re: [PATCH v4] Add /proc/PID/smaps support for DAX
>> >> >
>> >> >I'm honestly not understanding what problem this solves. Could you,
>> >> >perhaps, do a before and after of smaps with and without this patch?
>> >>
>> >> The motivation here is described in the commit message.
>> >> ------------------------------------------------------------------------------------------
>> >> Memory behind device DAX is not attached into normal memory
>> >> management system, when user mmap /dev/dax, smaps part is
>> >> currently missing, so no idea for user to check how much
>> >> device DAX memory are actually used in practice.
>> >
>> >This might be motivation but you are still not explaining _why_ that is
>> >a problem. _Who_ is going to use that information and for _what_
>> >purpose. This is really essential!
>>
>> If user created device DAX, how did one know how much memory being
>used?
>> by "used" I mean, page table mapping created, fact is no way to find out
>here.
>>
>> why does this master? The answer is same as I bought 512G SSD disk, why
>> do I want to check disk usage with `df`?! if application use only 128G, it
>makes
>> no sense at all for me to buy more redundant persistent memory if Cloud
>provider
>> has more other option to choose.
>
>I am not deeply familiar with DAX but I would expect that most users
>will use a FS on top of it where we have standard tools. If the use is
>direct then I can see how this make things more complicated but smaps is
>not the right answer IMHO. Why? Well, just consider that you map the
>same portions of the device multiple times for whatever reason and you
>are screwed because you have no means to distinguish those. So you will
>get bogus numbers. Or am I misunderstanding something?

Persistent memory has two user interfaces.
One is filesystem DAX, sitting on pmem block device, mounted with dax option to
by pass page cache, `df` could meets my customer needs.
The other is device DAX, where user could only mmap it to its address space,
No file system concept. That's why we need something equivalent with `df`.

Share mappings increase page mapcount, that's what PSS field for.
It will proportionate the total size, for example the library(2MB) used by two
processes, RSS will report 2MB plus each process own memory size,
where PSS report the proportionally one, 2MB/2(two processes
share the library, the page mapcount is 2) plus its own memory size.

Btw, smaps is the best place to be compatible with existing online monitoring tools
customer use now a days.

>If you need a DAX device statistics then make them device specific. This
>is the only reliable way to get valid data.

That's my first thought though, create another sys node for this,
Where smaps fields has ZERO readings as following without this patch.
22 lines of meaningless metrics here w.r.t device DAX.

7f6c00000000-7f6d80000000 rw-s 00000000 00:06 19810 /dev/dax12.0
Size: 6291456 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 0 kB
Pss: 0 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 0 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
ProtectionKey: 0
VmFlags: rd wr sh mr mw me ms mm hg

Last but not least, this fix does not populate smaps structure, it only shows
useful data for /dev/dax at RSS/PSS, and does not impact statistics other than
device DAX at all.

I have used all my English words I know to convince you, so be it.

>--
>Michal Hocko
>SUSE Labs