Re: [PATCH V2 0/6] VA to numa node information

From: Dave Hansen
Date: Thu Sep 13 2018 - 20:25:35 EST


On 09/13/2018 05:10 PM, Andrew Morton wrote:
>> Also, VMAs having THP pages can have a mix of 4k pages and hugepages.
>> The page walks would be efficient in scanning and determining if it is
>> a THP huge page and step over it. Whereas using the API, the application
>> would not know what page size mapping is used for a given VA and so would
>> have to again scan the VMA in units of 4k page size.
>>
>> If this sounds reasonable, I can add it to the commit / patch description.

As we are judging whether this is a "good" interface, can you tell us a
bit about its scalability? For instance, let's say someone has a 1TB
VMA that's populated with interleaved 4k pages. How much data comes
out? How long does it take to parse? Will we effectively deadlock the
system if someone accidentally cat's the wrong /proc file?

/proc seems like a really simple way to implement this, but it seems a
*really* odd choice for something that needs to collect a large amount
of data. The lseek() stuff is a nice addition, but I wonder if it's
unwieldy to use in practice. For instance, if you want to read data for
the VMA at 0x1000000 you lseek(fd, 0x1000000, SEEK_SET, right? You read
~20 bytes of data and then the fd is at 0x1000020. But, you're getting
data out at the next read() for (at least) the next page, which is also
available at 0x1001000. Seems funky. Do other /proc files behave this way?