Re: [PATCH v2 1/2] smaps: fill missing fields for vma(VM_HUGETLB)

From: Naoya Horiguchi
Date: Tue Aug 11 2015 - 19:39:38 EST


On Mon, Aug 10, 2015 at 05:37:54PM -0700, David Rientjes wrote:
> On Fri, 7 Aug 2015, Naoya Horiguchi wrote:
>
> > Currently smaps reports many zero fields for vma(VM_HUGETLB), which is
> > inconvenient when we want to know per-task or per-vma base hugetlb usage.
> > This patch enables these fields by introducing smaps_hugetlb_range().
> >
> > before patch:
> >
> > Size: 20480 kB
> > Rss: 0 kB
> > Pss: 0 kB
> > Shared_Clean: 0 kB
> > Shared_Dirty: 0 kB
> > Private_Clean: 0 kB
> > Private_Dirty: 0 kB
> > Referenced: 0 kB
> > Anonymous: 0 kB
> > AnonHugePages: 0 kB
> > Swap: 0 kB
> > KernelPageSize: 2048 kB
> > MMUPageSize: 2048 kB
> > Locked: 0 kB
> > VmFlags: rd wr mr mw me de ht
> >
> > after patch:
> >
> > Size: 20480 kB
> > Rss: 18432 kB
> > Pss: 18432 kB
> > Shared_Clean: 0 kB
> > Shared_Dirty: 0 kB
> > Private_Clean: 0 kB
> > Private_Dirty: 18432 kB
> > Referenced: 18432 kB
> > Anonymous: 18432 kB
> > AnonHugePages: 0 kB
> > Swap: 0 kB
> > KernelPageSize: 2048 kB
> > MMUPageSize: 2048 kB
> > Locked: 0 kB
> > VmFlags: rd wr mr mw me de ht
> >
>
> I think this will lead to breakage, unfortunately, specifically for users
> who are concerned with resource management.
>
> An example: we use memcg hierarchies to charge memory for individual jobs,
> specific users, and system overhead. Memcg is a cgroup, so this is done
> for an aggregate of processes, and we often have to monitor their memory
> usage. Each process isn't assigned to its own memcg, and I don't believe
> common users of memcg assign individual processes to their own memcgs.
>
> When a memcg is out of memory, we need to track the memory usage of
> processes attached to its memcg hierarchy to determine what is unexpected,
> either as a result of a new rollout or because of a memory leak. To do
> that, we use the rss exported by smaps that is now changed with this
> patch. By using smaps rather than /proc/pid/status, we can report where
> memory usage is unexpected.
>
> This would cause our process that manages all memcgs on our systems to
> break. Perhaps I haven't been as convincing in my previous messages of
> this, but it's quite an obvious userspace regression.

OK, this version assumes that userspace distinguishes vma(VM_HUGETLB) with
"VmFlags" field, which is unrealistic. So I'll keep all existing fields
untouched by introducing hugetlb usage info.

> This memory was not included in rss originally because memory in the
> hugetlb persistent pool is always resident. Unmapping the memory does not
> free memory. For this reason, hugetlb memory has always been treated as
> its own type of memory.

Right, so it might be better not to use the word "RSS" for hugetlb, maybe
something like "HugetlbPages:" seems better to me.

Thanks,
Naoya Horiguchi

> It would have been arguable back when hugetlbfs was introduced whether it
> should be included. I'm afraid the ship has sailed on that since a decade
> has past and it would cause userspace to break if existing metrics are
> used that already have cleared defined semantics.N‹§²æ¸›yú²X¬¶ÇvØ–)Þ{.nlj·¥Š{±‘êX§¶›¡Ü}©ž²ÆzÚj:+v‰¨¾«‘êZ+€Êzf£¢·hšˆ§~†­†Ûÿû®w¥¢¸?™¨è&¢)ßf”ùy§m…á«a¶Úÿ 0¶ìå