Re: [PATCH v4 0/7] Add support for Sub-NUMA cluster (SNC) systems

From: Tony Luck
Date: Wed Jul 26 2023 - 10:12:08 EST


On Tue, Jul 25, 2023 at 08:10:52PM -0700, Drew Fustini wrote:
> I think that the resctrl interface for RISC-V CBQRI could also benefit
> from separate domain lists for control and monitoring.
>
> For example, the bandwidth controller QoS register [1] interface allows
> a device to implement both bandwidth usage monitoring and bandwidth
> allocation. The resctrl proof-of-concept [2] had to awkwardly create two
> domains for each memory controller in our example SoC, one that would
> contain the MBA resource and one that would contain the L3 resource to
> represent MBM files like local_bytes.
>
> This resulted in a very odd looking schemata that would be hard to the
> user to understand:
>
> # cat /sys/fs/resctrl/schemata
> MB:4= 80;6= 80;8= 80
> L2:0=0fff;1=0fff
> L3:2=ffff;3=0000;5=0000;7=0000
>
> Where:
>
> Domain 0 is L2 cache controller 0 capacity allocation
> Domain 1 is L2 cache controller 1 capacity allocation
> Domain 2 is L3 cache controller capacity allocation
>
> Domain 4 is Memory controller 0 bandwidth allocation
> Domain 6 is Memory controller 1 bandwidth allocation
> Domain 8 is Memory controller 2 bandwidth allocation
>
> Domain 3 is Memory controller 0 bandwidth monitoring
> Domain 5 is Memory controller 1 bandwidth monitoring
> Domain 7 is Memory controller 2 bandwidth monitoring
>
> But there is no value of having the domains created for the purposes of
> bandwidth monitoring in schemata.

There's certainly no value in exposing those domain numbers
in the schemata file. There should also be some way for users
to decode the ids. On x86 the "id" is exposed in sysfs. Though
the user does need to work to get all the details:

$ cat /sys/devices/system/cpu/cpu36/cache/index3/level
3
$ cat /sys/devices/system/cpu/cpu36/cache/index3/id
1
$ cat /sys/devices/system/cpu/cpu36/cache/index3/shared_cpu_list
36-71,108-143

This shows the L3 cachce with id "1" is shared by CPUs 36-71,108-143

X86 also has independent domain numbers for each resource. So the
L2 ones count 0, 1, 2, ... and so do the L3 ones: 0, 1, 2 and the
MBA ones: 0, 1, 2

That fits well with the /sys decoding ... but maybe your approach of
not repeating domain numbers across different resources is less
confusing?

Note that in my resctrl re-write where each resource is handled by
a separate loadable module it may be hard for you to keep the unique
domain scheme as resource modules are unaware of each other. Though
perhaps its just an arch specific hook to provide domain numbers.

> I've not yet fully understood how the new approach in this patch series
> could help the situation for CBQRI, but I thought I would mention that
> separate lists for control and monitoring might be useful.

Good. It's nice to know there's potentially another use case for
this split besides SNC.

-Tony