Re: [PATCH v16 0/9] Add support for Sub-NUMA cluster (SNC) systems
From: Maciej Wieczor-Retman
Date: Wed Mar 20 2024 - 12:27:52 EST
On 2024-03-20 at 08:50:51 -0700, Reinette Chatre wrote:
>Hi Maciej,
>
>On 3/20/2024 8:21 AM, Maciej Wieczor-Retman wrote:
>> Hi Reinette,
>>
>> On 2024-03-19 at 10:51:14 -0700, Reinette Chatre wrote:
>>> What remains is the user interface that continues to gather opinions [3]. These new
>>> discussions were prompted by user space needing a way to determine if resctrl supports
>>> SNC. This started by using the "size" file but thinking about it more user space could
>>> also look at whether the number of L3 control domains are different from the number
>>> of L3 monitoring domains? I am adding Maciej for his opinion (please also include him
>>> in future versions of this series).
>>
>> By this do you mean comparing the contents of main "schemata" file with the
>> number of mon_L3_* files?
>
>(assuming you mean mon_L3_* directories)
>
>Yes, the "schemata" file can be used. There is also the "bit_usage" file in
>the info directory that indicates how many control domains there are.
I just did a test on my IceLake server that has SNC enabled and you're right.
This can indicate kernel support for SNC. It also seems more reliable in
determining the ratio of nodes per socket (since the ratio of cpus per
cache/node can potentially fail with offline cpus).
>Do you think doing so also falls into the "not obvious text parsing and size
>comparison" category?
Ideally [1] seems just the most user friendly in my opinion. Comparing schemata
with mon_L3_* directories feels like a good next solution though.
>>> Apart from the user space requirement to know if SNC is supported by resctrl there
>>> is also the interface with which user space obtains the monitoring data.
>>> James highlighted [1] that the interface used in this series uses existing files to
>>> represent different content, and can thus be considered as "broken". It is not obvious
>>> to me how to "fix" this. Should we continue to explore interfaces like [2] that
>>> attempts to add SNC support into resctrl or should the message continue to be
>>> that SNC "plays havoc with the RDT monitoring features" and users wanting to use
>>> SNC and RDT at the same time are expected to adapt to the peculiar interface ...
>>> or is the preference that after this series "SNC and RDT are compatible" and
>>> thus presented with an intuitive interface?
>>
>> I kind of liked this idea [1]. Hiding SNC related information behind some not
>> obvious text parsing and size comparisons might eliminate any ease of use for
>> userspace applications. But I agree with you [2] that it's hard to predict the
>> future for this interface and any potential problems with setting up this
>> file structure.
>
>Thanks to you for trying this out from user space side and highlighting the
>difficulty trying to do so.
I suppose it wasn't that difficult in execution but it required a lot of
thinking about what is a reliable way for checking kernel support, what can fail
in checking the ratio of nodes to sockets etc. Maybe once we have this figured
out all it will take is a good documentation and userspace applications won't
have a hard time with SNC.
>
>Reinette
>
[1] https://lore.kernel.org/all/SJ1PR11MB608309F47C00F964E16205D6FC2D2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
[2] https://lore.kernel.org/all/7f15a700-f23a-48f9-b335-13ea1735ad84@xxxxxxxxx/
--
Kind regards
Maciej Wieczór-Retman