Re: [PATCH v2 1/2] selftests/resctrl: Adjust effective L3 cache size with SNC enabled

From: Maciej Wieczor-Retman
Date: Wed Jun 26 2024 - 03:10:27 EST


On 2024-06-25 at 09:28:55 -0700, Reinette Chatre wrote:
>Hi Maciej,
>On 6/25/24 4:04 AM, Maciej Wieczor-Retman wrote:
>> Hello,
>> sorry it took me so long to get back to this. I prepared the next version with
>> your comments applied and Tony's replies taken into account.
>Thank you very much for sticking with this.
>> I wanted to briefly discuss this before posting:
>> On 2024-05-30 at 16:07:29 -0700, Reinette Chatre wrote:
>> > On 5/15/24 4:18 AM, Maciej Wieczor-Retman wrote:
>> > > + return 1;
>> > > + }
>> > > +
>> > > + for (i = 1; i <= MAX_SNC ; i++) {
>> > > + if (i * node_cpus >= cache_cpus)
>> > > + return i;
>> > > + }
>> >
>> > This is not obvious to me. From the function comments this seems to address the
>> > scenarios when CPUs from other nodes are offline. It is not clear to me how
>> > this loop addresses this. For example, let's say there are four SNC nodes
>> > associated with a cache and only the node0 CPUs are online. The above would
>> > detect this as "1", not "4", if I read this right?
>> >
>> > I wonder if it may not be easier to just follow what the kernel does
>> > (in the new version).
>> > User space can learn the number of online and present CPUs from
>> > /sys/devices/system/cpu/online and /sys/devices/system/cpu/present
>> > respectively. A simple string compare of the contents can be used to
>> > determine if they are identical and a warning can be printed if they are not.
>> > With a warning when accurate detection cannot be done the simple
>> > check will do.
>> >
>> > Could you please add an informational message indicating how many SNC nodes
>> > were indeed detected?
>> Should the information "how many SNC nodes are detected?" get printed every time
>> (by which I mean at the end of CMT and MBM tests) or only when we get the error
>> "SNC enabled but kernel doesn't support it" happens? Of course in the first case
>> if there is only 1 node detected nothing would be printed to avoid noise.
>I agree that it is not needed to print something about SNC if it is disabled.
>hmmm ... so SNC impacts every test but it is only detected by default during CAT
>and CMT test, with MBA and MBM "detection" only triggered if the test fails?

Yes, snc_ways() ran before starting CAT and CMT to adjust cache size variable.
And then after CAT,CMT,MBM and MBA if the return value indicated failure.

>What if the "SNC detection" is moved to be within run_single_test() but instead of
>repeating the detection from scratch every time it rather works like get_vendor()
>where the full detection is only done on first attempt? run_single_test() can detect if
>SNC is enabled and (if number of SNC nodes > 1) print an informational message
>that is inherited by all tests.
>Any test that needs to know the number of SNC nodes can continue to use the
>same function used for detection (that only does actual detection once).
>What do you think?

I think running the detection once at the start and then reusing the results is
a good idea. You're proposing adding a value (global or passed through all the
tests) that would get initialized on the first run_single_test()?

And then the SNC status (if enabled) + a warning if the detection could be wrong
(because of the online/present cpus ratio) would happen before the test runs?

On the warning placement I think it should be moved out of being printed only on
failure. I did some experiments using "chcpu" to enable/disable cores and then
run selftests. They didn't have any problems succeeding even though SNC
detection detected different mode every time (I added a printf() around the line
that cache size is modified to show what SNC mode is detected). While I
understand these tests shouldn't fail since they just use a different portion of
the cache I think the user should be informed it's not really NUMA aware if the
detection was wrong:

(this was a 2 socket machine with SNC-2 and 55296K L3 cache size)

This is without any changes:
[root]# ./resctrl_tests -t CMT
# dmesg: [ 11.464842] resctrl: Sub-NUMA Cluster mode detected with 2 nodes per L3 cache
# Cache size :28311552
# Average LLC val: 12413952
# Cache span (bytes): 11796480
ok 1 CMT: test

This is with all cores on node 1 disabled:
[root]# ./resctrl_tests -t CMT
# dmesg: [ 11.464842] resctrl: Sub-NUMA Cluster mode detected with 2 nodes per L3 cache
# Cache size :56623104
# Average LLC val: 22606848
# Cache span (bytes): 23592960
ok 1 CMT: test

And this with one core on node 0 disabled:
[root]# ./resctrl_tests -t CMT
# dmesg: [ 11.464842] resctrl: Sub-NUMA Cluster mode detected with 2 nodes per L3 cache
# Cache size :18874368
# Average LLC val: 7382016
# Cache span (bytes): 7864320
ok 1 CMT: test

CAT also succeeds although it reports bigger or smaller cache miss rates than

SNC NODES DETECTED : 1 <-- all cpus on node 1 offline
# Percent diff=12.7
# Percent diff=10.2
# Percent diff=7.8
# Percent diff=6.7
ok 1 L3_CAT: test

# Percent diff=49.6
# Percent diff=37.8
# Percent diff=22.4
# Percent diff=16.0
ok 1 L3_CAT: test

SNC NODES DETECTED : 3 <-- one cpu on node 0 offline
# Percent diff=76.6
# Percent diff=53.3
# Percent diff=35.1
# Percent diff=28.9
ok 1 L3_CAT: test


Kind regards
Maciej Wieczór-Retman