Re: [PATCH] selftests/resctrl: Fix noncont_cat_run_test for AMD

From: Reinette Chatre
Date: Thu Jun 06 2024 - 19:59:03 EST


Hi Babu,

On 6/6/24 4:09 PM, Moger, Babu wrote:
Hi Reinette,


On 6/6/2024 3:33 PM, Reinette Chatre wrote:
Hi Babu,

On 6/5/24 2:36 PM, Babu Moger wrote:
The selftest noncont_cat_run_test fails on AMD with the warnings. Reason
is, AMD supports non contiguous CBM masks but does not report it via CPUID.

Update noncont_cat_run_test to check for the vendor when verifying CPUID.

Fixes: ae638551ab64 ("selftests/resctrl: Add non-contiguous CBMs CAT test")
Signed-off-by: Babu Moger <babu.moger@xxxxxxx>
---
This was part of the series
https://lore.kernel.org/lkml/cover.1708637563.git.babu.moger@xxxxxxx/
Sending this as a separate fix per review comments.
---
  tools/testing/selftests/resctrl/cat_test.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
index d4dffc934bc3..b2988888786e 100644
--- a/tools/testing/selftests/resctrl/cat_test.c
+++ b/tools/testing/selftests/resctrl/cat_test.c
@@ -308,7 +308,7 @@ static int noncont_cat_run_test(const struct resctrl_test *test,
      else
          return -EINVAL;
-    if (sparse_masks != ((ecx >> 3) & 1)) {
+    if ((get_vendor() == ARCH_INTEL) && sparse_masks != ((ecx >> 3) & 1)) {
          ksft_print_msg("CPUID output doesn't match 'sparse_masks' file content!\n");
          return 1;
      }

Since AMD does not report this support via CPUID it does not seem
appropriate to use CPUID at all on AMD when doing the hardware check.
I think the above check makes it difficult to understand what is different
on AMD.

What if instead there is a new function, for example,
"static bool arch_supports_noncont_cat(const struct resctrl_test *test)"
that returns true if the hardware supports non-contiguous CBM?

Sure.


The vendor check can be in there to make it obvious what is going on:

     /* AMD always supports non-contiguous CBM. */
     if (get_vendor() == AMD)
         return true;

     /* CPUID check for Intel here. */

The "sparse_masks" from kernel can then be checked against
hardware support with an appropriate (no mention of CPUID)
error message if this fails.


Something like this?


diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
index d4dffc934bc3..b75d220f29f6 100644
--- a/tools/testing/selftests/resctrl/cat_test.c
+++ b/tools/testing/selftests/resctrl/cat_test.c
@@ -288,11 +288,30 @@ static int cat_run_test(const struct resctrl_test *test, const struct user_param
        return ret;
 }

+static bool arch_supports_noncont_cat(const struct resctrl_test *test)
+{
+       unsigned int eax, ebx, ecx, edx;
+
+       /* AMD always supports non-contiguous CBM. */
+       if (get_vendor() == ARCH_AMD) {
+               return true;
+       } else {

The else can be dropped since it follows a return.
The rest of the code can be prefixed with a matching
comment like:
/* Intel support for non-contiguous CBM needs to be discovered. */

(please feel free to improve)

+               if (!strcmp(test->resource, "L3"))
+                       __cpuid_count(0x10, 1, eax, ebx, ecx, edx);
+               else if (!strcmp(test->resource, "L2"))
+                       __cpuid_count(0x10, 2, eax, ebx, ecx, edx);
+               else
+                       return false;
+
+               return ((ecx >> 3) & 1);
+       }
+}
+
 static int noncont_cat_run_test(const struct resctrl_test *test,
                                const struct user_params *uparams)
 {
        unsigned long full_cache_mask, cont_mask, noncont_mask;
-       unsigned int eax, ebx, ecx, edx, sparse_masks;
+       unsigned int sparse_masks;
        int bit_center, ret;
        char schemata[64];

@@ -301,15 +320,8 @@ static int noncont_cat_run_test(const struct resctrl_test *test,
        if (ret)
                return ret;

-       if (!strcmp(test->resource, "L3"))
-               __cpuid_count(0x10, 1, eax, ebx, ecx, edx);
-       else if (!strcmp(test->resource, "L2"))
-               __cpuid_count(0x10, 2, eax, ebx, ecx, edx);
-       else
-               return -EINVAL;
-
-       if (sparse_masks != ((ecx >> 3) & 1)) {
-               ksft_print_msg("CPUID output doesn't match 'sparse_masks' file content!\n");
+       if (!(arch_supports_noncont_cat(test) && sparse_masks)) {
+               ksft_print_msg("Hardware does not support non-contiguous CBM!\n");

Please fix the test as well as the message. It is not an error if hardware does
not support non-contiguous CBM. It is an error if the hardware and kernel disagrees whether
non-contiguous CBM is supported.

Reinette