Re: [RFC PATCH] Add /proc/<pid>/numa_vamaps for numa node information

From: prakash.sangappa
Date: Thu May 03 2018 - 18:26:06 EST




On 05/03/2018 01:46 AM, Anshuman Khandual wrote:
On 05/03/2018 03:58 AM, Dave Hansen wrote:
On 05/02/2018 02:33 PM, Andrew Morton wrote:
On Tue, 1 May 2018 22:58:06 -0700 Prakash Sangappa <prakash.sangappa@xxxxxxxxxx> wrote:
For analysis purpose it is useful to have numa node information
corresponding mapped address ranges of the process. Currently
/proc/<pid>/numa_maps provides list of numa nodes from where pages are
allocated per VMA of the process. This is not useful if an user needs to
determine which numa node the mapped pages are allocated from for a
particular address range. It would have helped if the numa node information
presented in /proc/<pid>/numa_maps was broken down by VA ranges showing the
exact numa node from where the pages have been allocated.
I'm finding myself a little lost in figuring out what this does. Today,
numa_maps might us that a 3-page VMA has 1 page from Node 0 and 2 pages
from Node 1. We group *entirely* by VMA:

1000-4000 N0=1 N1=2

We don't want that. We want to tell exactly where each node's memory is
despite if they are in the same VMA, like this:

1000-2000 N1=1
2000-3000 N0=1
3000-4000 N1=1
I am kind of wondering on a big memory system how many lines of output
we might have for a large (consuming lets say 80 % of system RAM) VMA
in interleave policy. Is not that a problem ?

If each consecutive page comes from different node, yes in
the extreme case is this file will have a lot of lines. All the lines
are generated at the time file is read. The amount of data read will be
limited to the user read buffer size used in the read.

/proc/<pid>/pagemap also has kind of similar issue. There is 1 64 bit value
for each user page.