Re: [RFC PATCH] Add /proc/<pid>/numa_vamaps for numa node information
From: Dave Hansen
Date: Wed May 02 2018 - 18:29:09 EST
On 05/02/2018 02:33 PM, Andrew Morton wrote:
> On Tue, 1 May 2018 22:58:06 -0700 Prakash Sangappa <prakash.sangappa@xxxxxxxxxx> wrote:
>> For analysis purpose it is useful to have numa node information
>> corresponding mapped address ranges of the process. Currently
>> /proc/<pid>/numa_maps provides list of numa nodes from where pages are
>> allocated per VMA of the process. This is not useful if an user needs to
>> determine which numa node the mapped pages are allocated from for a
>> particular address range. It would have helped if the numa node information
>> presented in /proc/<pid>/numa_maps was broken down by VA ranges showing the
>> exact numa node from where the pages have been allocated.
I'm finding myself a little lost in figuring out what this does. Today,
numa_maps might us that a 3-page VMA has 1 page from Node 0 and 2 pages
from Node 1. We group *entirely* by VMA:
1000-4000 N0=1 N1=2
We don't want that. We want to tell exactly where each node's memory is
despite if they are in the same VMA, like this:
1000-2000 N1=1
2000-3000 N0=1
3000-4000 N1=1
So that no line of output ever has more than one node's memory. It
*appears* in this new file as if each contiguous range of memory from a
given node has its own VMA. Right?
This sounds interesting, but I've never found myself wanting this
information a single time that I can recall. I'd love to hear more.
Is this for debugging? Are apps actually going to *parse* this file?
How hard did you try to share code with numa_maps? Are you sure we
can't just replace numa_maps? VMAs are a kernel-internal thing and we
never promised to represent them 1:1 in our ABI.
Are we going to continue creating new files in /proc every time a tiny
new niche pops up? :)