[PATCH] smaps: fill missing fields for vma(VM_HUGETLB)

From: Naoya Horiguchi
Date: Tue Aug 04 2015 - 01:18:37 EST


On Tue, Aug 04, 2015 at 02:55:30AM +0000, Naoya Horiguchi wrote:
> On Wed, Jul 29, 2015 at 04:20:59PM -0700, Mike Kravetz wrote:
> > On 07/29/2015 12:08 PM, David Rientjes wrote:
> > >On Tue, 28 Jul 2015, JÃrn Engel wrote:
> > >
> > >>Well, we definitely need something. Having a 100GB process show 3GB of
> > >>rss is not very useful. How would we notice a memory leak if it only
> > >>affects hugepages, for example?
> > >>
> > >
> > >Since the hugetlb pool is a global resource, it would also be helpful to
> > >determine if a process is mapping more than expected. You can't do that
> > >just by adding a huge rss metric, however: if you have 2MB and 1GB
> > >hugepages configured you wouldn't know if a process was mapping 512 2MB
> > >hugepages or 1 1GB hugepage.
> > >
> > >That's the purpose of hugetlb_cgroup, after all, and it supports usage
> > >counters for all hstates. The test could be converted to use that to
> > >measure usage if configured in the kernel.
> > >
> > >Beyond that, I'm not sure how a per-hstate rss metric would be exported to
> > >userspace in a clean way and other ways of obtaining the same data are
> > >possible with hugetlb_cgroup. I'm not sure how successful you'd be in
> > >arguing that we need separate rss counters for it.
> >
> > If I want to track hugetlb usage on a per-task basis, do I then need to
> > create one cgroup per task?
> >
> > For example, suppose I have many tasks using hugetlb and the global pool
> > is getting low on free pages. It might be useful to know which tasks are
> > using hugetlb pages, and how many they are using.
> >
> > I don't actually have this need (I think), but it appears to be what
> > JÃrn is asking for.
>
> One possible way to get hugetlb metric in per-task basis is to walk page
> table via /proc/pid/pagemap, and counting page flags for each mapped page
> (we can easily do this with tools/vm/page-types.c like "page-types -p <PID>
> -b huge"). This is obviously slower than just storing the counter as
> in-kernel data and just exporting it, but might be useful in some situation.

BTW, currently smaps doesn't report any meaningful info for vma(VM_HUGETLB).
I wrote the following patch, which hopefully is helpful for your purpose.

Thanks,
Naoya Horiguchi

---
From: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Subject: [PATCH] smaps: fill missing fields for vma(VM_HUGETLB)

Currently smaps reports many zero fields for vma(VM_HUGETLB), which is
inconvenient when we want to know per-task or per-vma base hugetlb usage.
This patch enables these fields by introducing smaps_hugetlb_range().

before patch:

Size: 20480 kB
Rss: 0 kB
Pss: 0 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 0 kB
Anonymous: 0 kB
AnonHugePages: 0 kB
Swap: 0 kB
KernelPageSize: 2048 kB
MMUPageSize: 2048 kB
Locked: 0 kB
VmFlags: rd wr mr mw me de ht

after patch:

Size: 20480 kB
Rss: 18432 kB
Pss: 18432 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 18432 kB
Referenced: 18432 kB
Anonymous: 18432 kB
AnonHugePages: 0 kB
Swap: 0 kB
KernelPageSize: 2048 kB
MMUPageSize: 2048 kB
Locked: 0 kB
VmFlags: rd wr mr mw me de ht

Signed-off-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
---
fs/proc/task_mmu.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ca1e091881d4..c7218603306d 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -610,12 +610,39 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
seq_putc(m, '\n');
}

+#ifdef CONFIG_HUGETLB_PAGE
+static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
+ unsigned long addr, unsigned long end,
+ struct mm_walk *walk)
+{
+ struct mem_size_stats *mss = walk->private;
+ struct vm_area_struct *vma = walk->vma;
+ struct page *page = NULL;
+
+ if (pte_present(*pte)) {
+ page = vm_normal_page(vma, addr, *pte);
+ } else if (is_swap_pte(*pte)) {
+ swp_entry_t swpent = pte_to_swp_entry(*pte);
+
+ if (is_migration_entry(swpent))
+ page = migration_entry_to_page(swpent);
+ }
+ if (page)
+ smaps_account(mss, page, huge_page_size(hstate_vma(vma)),
+ pte_young(*pte), pte_dirty(*pte));
+ return 0;
+}
+#endif /* HUGETLB_PAGE */
+
static int show_smap(struct seq_file *m, void *v, int is_pid)
{
struct vm_area_struct *vma = v;
struct mem_size_stats mss;
struct mm_walk smaps_walk = {
.pmd_entry = smaps_pte_range,
+#ifdef CONFIG_HUGETLB_PAGE
+ .hugetlb_entry = smaps_hugetlb_range,
+#endif
.mm = vma->vm_mm,
.private = &mss,
};
--
2.4.3