[PATCH v2] x86/resctrl: Clear the stale staged config after the configuration is completed

From: Shawn Wang
Date: Sun Oct 09 2022 - 04:37:02 EST


As a temporary storage, array staged_config in struct rdt_domain is not
cleared after it has been used. The stale value in staged_config could
cause a MSR access error.

If resctrl is mounted with CDP enabled and then remounted with CDP
disabled, the value of staged_config in domainX via mkdir changes as
follows:

CDP enabled:
mkdir <resource control group>
domainX.staged_config[CDP_NONE].have_new_ctrl = false
domainX.staged_config[CDP_NONE].new_ctrl = 0
domainX.staged_config[CDP_CODE].have_new_ctrl = true
domainX.staged_config[CDP_CODE].new_ctrl = <default mask>
domainX.staged_config[CDP_DATA].have_new_ctrl = true
domainX.staged_config[CDP_CODE].new_ctrl = <default mask>

unmount/remount resctrl (CDP disabled):
mkdir <resource control group>
domainX.staged_config[CDP_NONE].have_new_ctrl = true
domainX.staged_config[CDP_NONE].new_ctrl = <default mask>
domainX.staged_config[CDP_CODE].have_new_ctrl = true /* stale */
domainX.staged_config[CDP_CODE].new_ctrl = <default mask> /* stale */
domainX.staged_config[CDP_DATA].have_new_ctrl = true /* stale */
domainX.staged_config[CDP_CODE].new_ctrl = <default mask> /* stale */

When creating a new resource control group, hardware will be configured by
resctrl_arch_update_domains():
rdtgroup_mkdir()->rdtgroup_mkdir_ctrl_mon()->rdtgroup_init_alloc()->resctrl_arch_update_domains()

Since resctrl_arch_update_domains() iterates and updates all
resctrl_conf_type whose have_new_ctrl is true, it will continue to update
the stale CDP_CODE and CDP_DATA configurations when CDP is disabled.

Based on the above analysis, an error can be reproduced on a system with
16 usable CLOSIDs for a 15-way L3 Cache (MBA should be disabled if the
number of CLOSIDs for MB is less than 16.) :
mount -t resctrl resctrl -o cdp /sys/fs/resctrl
mkdir /sys/fs/resctrl/p{1..7}
umount /sys/fs/resctrl/
mount -t resctrl resctrl /sys/fs/resctrl
mkdir /sys/fs/resctrl/p{1..8}

dmesg will generate the following error:
[ 6180.939345] unchecked MSR access error: WRMSR to 0xca0 (tried to write
0x00000000000007ff) at rIP: 0xffffffff82249142 (cat_wrmsr+0x32/0x60)
[ 6180.951983] Call Trace:
[ 6180.954516] <IRQ>
[ 6180.956619] __flush_smp_call_function_queue+0x11d/0x170
[ 6180.962028] __sysvec_call_function+0x24/0xd0
[ 6180.966485] sysvec_call_function+0x89/0xc0
[ 6180.970760] </IRQ>
[ 6180.972947] <TASK>
[ 6180.975131] asm_sysvec_call_function+0x16/0x20
[ 6180.979757] RIP: 0010:cpuidle_enter_state+0xcd/0x400
[ 6180.984821] Code: 49 89 c5 0f 1f 44 00 00 31 ff e8 1e e5 77 ff 45 84
ff 74 12 9c 58 f6 c4 02 0f 85 13 03 00 00 31 ff e8 67 70 7d ff fb 45 85
f6 <0f> 88 75 01 00 00 49 63 c6 4c 2b 2c 24 48 8d 14 40 48 8d 14 90 49
[ 6181.003710] RSP: 0018:ffffffff83a03e48 EFLAGS: 00000202
[ 6181.009028] RAX: ffff943400800000 RBX: 0000000000000001 RCX: 000000000000001f
[ 6181.016261] RDX: 0000000000000000 RSI: ffffffff83795059 RDI: ffffffff837c101e
[ 6181.023490] RBP: ffff9434c9352000 R08: 0000059f1cb1a05e R09: 0000000000000008
[ 6181.030717] R10: 0000000000000001 R11: 0000000000005c66 R12: ffffffff83bbf3a0
[ 6181.037944] R13: 0000059f1cb1a05e R14: 0000000000000001 R15: 0000000000000000
[ 6181.045202] ? cpuidle_enter_state+0xb2/0x400
[ 6181.049678] cpuidle_enter+0x24/0x40
[ 6181.053370] do_idle+0x1dd/0x260
[ 6181.056713] cpu_startup_entry+0x14/0x20
[ 6181.060753] rest_init+0xbb/0xc0
[ 6181.064097] arch_call_rest_init+0x5/0xa
[ 6181.068137] start_kernel+0x668/0x691
[ 6181.071914] secondary_startup_64_no_verify+0xe0/0xeb
[ 6181.077086] </TASK>

As Reinette Chatre explained:
https://lore.kernel.org/all/2728c354-ac75-be4c-66ad-86ebd9c50248@xxxxxxxxx/
"
The value of interest here is the register it tries to write to ... 0xca0.
On a system with 16 CLOSIDs the range of registers available to set the
CBM would be 0xc90 to 0xc9f that corresponds to CLOSID 0 to CLOSID 15.
The error is an attempt to write to an unsupported register - there
appears to have been an attempt to configure non-existent CLOSID 16.

Above becomes an issue when the resource group being created is for a
CLOSID # that is more than half of the CLOSIDs supported. In the
reproducer the issue was encountered when creating resource group for
CLOSID 8 on a system that supports 16 CLOSIDs.

In this case get_config_index() called from resctrl_arch_update_domains()
will return 16 and 17 when processing this resource group and that
translated to an invalid register - 0xca0 in this scenario.
"

Fix this issue by clearing the staged configs when the configuration is
completed.

Fixes: 75408e43509ed ("x86/resctrl: Allow different CODE/DATA configurations to be staged")
Cc: <stable@xxxxxxxxxxxxxxx>
Signed-off-by: Shawn Wang <shawnwang@xxxxxxxxxxxxxxxxx>
Suggested-by: Xin Hao <xhao@xxxxxxxxxxxxxxxxx>
---
Changes since v1:
- Move the clearing from schemata_list_destroy() to resctrl_arch_update_domains().
- Update the commit message suggested by Reiniette Chatre.
- Add stable tag suggested by James Morse.
---
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 1dafbdc5ac31..2c719da5544f 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -338,6 +338,8 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
msr_param.high = max(msr_param.high, idx + 1);
}
}
+ /* Clear the stale staged config */
+ memset(d->staged_config, 0, sizeof(d->staged_config));
}

if (cpumask_empty(cpu_mask))
--
2.27.0