Re: [PATCH 2/2] Add /proc/PID/{smaps, numa_maps} support for DAX
From: Michal Hocko
Date: Thu Oct 26 2017 - 11:07:42 EST
On Thu 26-10-17 07:51:20, Dave Hansen wrote:
> On 10/26/2017 07:31 AM, Michal Hocko wrote:
> > On Thu 26-10-17 07:24:14, Dave Hansen wrote:
> >> Actually, I don't remember whether it was tooling or just confused
> >> humans. I *think* Dan was trying to write test cases for huge page DAX
> >> support and couldn't figure out whether or not it was using large pages.
> >
> > That sounds like a very weak justification to adding new stuff to smaps
> > to be honest.
>
> Yep, agreed. It can't go in _just_ for DAX, and Fan and the other DAX
> folks need to elaborate on their needs here.
>
> Do you have any better ideas? If we did this, we could probably make an
> argument that the AnonHuge fields could go away some day. They haven't
> always been there.
Yeah, if there is an effective way to see large mappings within the
range then AnonHuge should probably go away. If that is possible due to
userspace regressions...
> The only other alternative I can think of are truly DAX-specific
> interfaces, which also seem like a really bad idea.
Well, if this is mostly for debugging purposes then why not?
> > Not only that. There have been reports that reading smaps is too
> > expensive. Curiously enough the overhead doesn't come up from
> > the data collection but rather copying to the userspace. So we should be
> > careful to not print data that is not of general use.
>
> Yikes! I just did a quick:
>
> while true; do cat /proc/*/smaps | wc ; done
>
> and the copying out to userspace is ~1/15th the overhead of
> smaps_account(). Something sounds screwy if you're seeing the overhead
> at copying to userspace.
Yeah. I have looked into this some time ago and managed to cut some of
it by using dedicated seq_* functions rather than seq_printf which seems
to be quite expensive. But I haven't spent too much time on it.
> What else can we do than continue to bloat smaps? Could we do a file
> per VMA?
>
> /proc/$pid/smap/0x123000-0x456000
Wouldn't that be too volatile? I mean things can change considerably
between readdir and readfile... But I confess I haven't thought that
through.
--
Michal Hocko
SUSE Labs