On 2024-07-03 at 13:51:03 -0700, Reinette Chatre wrote:
On 7/3/24 12:43 AM, Maciej Wieczór-Retman wrote:
On 3.07.2024 00:21, Reinette Chatre wrote:
On 7/1/24 7:18 AM, Maciej Wieczor-Retman wrote:
SNC might not be enabled at all so there would be no reason to send the user
to check their BIOS - instead they can learn they have offline CPUs and they can
work on fixing that. In my opinion it could be beneficial to have more specialized
messages in the selftests to help users diagnose problems quicker.
My goal is indeed to have specialized messages. There cannot be a specialized message
if snc_reliable == 1. In this case it needs to be generic since SNC may or may not be
enabled and it is up to the user to investigate further.
How about this in cmt_run_test() for example:
if (snc_unreliable)
ksft_print_msg("Intel CMT may be inaccurate or inefficient when Sub-NUMA Clustering is enabled and not properly detected.\n");
else if (ret && (get_vendor() == ARCH_INTEL) && !snc_kernel_support())
ksft_print_msg("Kernel doesn't support Sub-NUMA Clustering but it is enabled. Check BIOS configuration.\n");
This way there is a generic message when snc_unreliable == 1.
And as you mentioned at the end of this email, the user can be expected to
backtrack to the beginning of the test if there are any problems so they can
discover the exact source of the issue - offline cpus.
Having only this one message wihtout the "if snc unreliable" messages would
mean nothing would get printed at the end on success with unreliable SNC detection.
Having a pass/fail is what user will focus on. If the test passes then SNC detection
should not matter. The messages are just there to help user root cause where a failure
may be.
My train of thought was that if test passes with broken SNC detection it means
SNC was used inefficiently right? (either the portion of L3 used was bigger or
smaller than that allocated for one cluster)
It's not exactly a failure but I thought it deserves a warning at the very end
to alert the user.
If you don't think the warning should be printed on success I guess the
condition can be:
if (ret && snc_unreliable)
and the user can look at the start of the test if they care about more
information. And the message can lose the "inefficient" word since it would only
happen on error.