Re: [PATCH 0/7] ACPI HMAT memory sysfs representation
From: Anshuman Khandual
Date: Fri Nov 23 2018 - 01:43:06 EST
On 11/22/2018 11:31 PM, Dave Hansen wrote:
> On 11/22/18 3:52 AM, Anshuman Khandual wrote:
>>>
>>> It sounds like the subset that's being exposed is insufficient for yo
>>> We did that because we think doing anything but a subset in sysfs will
>>> just blow up sysfs: MAX_NUMNODES is as high as 1024, so if we have 4
>>> attributes, that's at _least_ 1024*1024*4 files if we expose *all*
>>> combinations.
>> Each permutation need not be a separate file inside all possible NODE X
>> (/sys/devices/system/node/nodeX) directories. It can be a top level file
>> enumerating various attribute values for a given (X, Y) node pair based
>> on an offset something like /proc/pid/pagemap.
>
> My assumption has been that this kind of thing is too fancy for sysfs:
Applications need to know the matrix of multi attribute properties as
seen from various memory accessors/initiators to be able to bind them
to desired CPUs and memory. That gives applications true view of an
heterogeneous system. While I understand your concern here about the
sysfs (which can be worked around with probably multiple global files
may be if the size is a problem etc) but an insufficient interface is
definitely problematic in longer term. This is going to be an ABI which
is locked in for good. Hence even it might appear over engineering at
the moment but IMHO is the right thing to do.
>
> Documentation/filesystems/sysfs.txt:
>> Attributes should be ASCII text files, preferably with only one value
>> per file. It is noted that it may not be efficient to contain only one
>> value per file, so it is socially acceptable to express an array of
>> values of the same type.
>>
>> Mixing types, expressing multiple lines of data, and doing fancy
>> formatting of data is heavily frowned upon. Doing these things may get
>> you publicly humiliated and your code rewritten without notice.
>
> /proc/pid/pagemap is binary, not one-value-per-file and relatively
> complicated to parse.
I agree but it does provide user space really valuable information about
the faulted pages for it's VA space. Was there any better way of getting
it ? May be but at this point in time it is essential.
>
> Do you really think following something like pagemap is the right model
> for sysfs.>
> BTW, I'm not saying we don't need *some* interface like you propose. We
> almost certainly will at some point. I just don't think it will be in
> sysfs.
I am not saying doing this in sysfs is very elegant. I would rather have
a syscall read back (MAX_NODES * MAX_NODES * u64) attribute matrix from
the kernel. Probably a subset of that information can appear on sysfs to
speed of queries for various optimizations as Keith mentioned before. But
we will have to first evaluate and come to an agreement what constitutes
a comprehensive set for multi attribute properties. Are we willing to go
in the direction for inclusion of a new system call, subset of it appears
on sysfs etc ? My primary concern is not how the attribute information
appears on the sysfs but lack of it's completeness.