Re: [PATCH] fs/resctrl: Fix MBM events being unconditionally enabled in mbm_event mode
From: Moger, Babu
Date: Wed Oct 15 2025 - 10:55:36 EST
Hi Reinette,
On 10/14/2025 6:09 PM, Reinette Chatre wrote:
Hi Babu,
On 10/14/25 3:45 PM, Moger, Babu wrote:
On 10/14/2025 3:57 PM, Reinette Chatre wrote:
On 10/14/25 10:43 AM, Babu Moger wrote:
Yes. I saw the issues. It fails to mount in my case with panic trace.
(Just to ensure that there is not anything else going on) Could you please confirm if the panic is from
mon_add_all_files()->mon_event_read()->mon_event_count()->__mon_event_count()->resctrl_arch_reset_rmid()
that creates the MBM event files during mount and then does the initial read of RMID to determine the
starting count?
It happens just before that (at mbm_cntr_get). We have not allocated d->cntr_cfg for the counters.
===================Panic trace =================================
349.330416] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 349.338187] #PF: supervisor read access in kernel mode
[ 349.343914] #PF: error_code(0x0000) - not-present page
[ 349.349644] PGD 10419f067 P4D 0
[ 349.353241] Oops: Oops: 0000 [#1] SMP NOPTI
[ 349.357905] CPU: 45 UID: 0 PID: 3449 Comm: mount Not tainted 6.18.0-rc1+ #120 PREEMPT(voluntary)
[ 349.367803] Hardware name: AMD Corporation PURICO/PURICO, BIOS RPUT1003E 12/11/2024
[ 349.376334] RIP: 0010:mbm_cntr_get+0x56/0x90
[ 349.381096] Code: 45 8d 41 fe 83 f8 01 77 3d 8b 7b 50 85 ff 7e 36 49 8b 84 24 f0 04 00 00 45 31 c0 eb 0d 41 83 c0 01 48 83 c0 10 44 39 c7 74 1c <48> 3b 50 08 75 ed 3b 08 75 e9 48 83 c4 10 44 89 c0 5b 41 5c 41 5d
[ 349.402037] RSP: 0018:ff56bba58655f958 EFLAGS: 00010246
[ 349.407861] RAX: 0000000000000000 RBX: ffffffff9525b900 RCX: 0000000000000002
[ 349.415818] RDX: ffffffff95d526a0 RSI: ff1f5d52517c1800 RDI: 0000000000000020
[ 349.423774] RBP: ff56bba58655f980 R08: 0000000000000000 R09: 0000000000000001
[ 349.431730] R10: ff1f5d52c616a6f0 R11: fffc6a2f046c3980 R12: ff1f5d52517c1800
[ 349.439687] R13: 0000000000000001 R14: ffffffff95d526a0 R15: ffffffff9525b968
[ 349.447635] FS: 00007f17926b7800(0000) GS:ff1f5d59d45ff000(0000) knlGS:0000000000000000
[ 349.456659] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 349.463064] CR2: 0000000000000008 CR3: 0000000147afe002 CR4: 0000000000771ef0
[ 349.471022] PKRU: 55555554
[ 349.474033] Call Trace:
[ 349.476755] <TASK>
[ 349.479091] ? kernfs_add_one+0x114/0x170
[ 349.483560] rdtgroup_assign_cntr_event+0x9b/0xd0
[ 349.488795] rdtgroup_assign_cntrs+0xab/0xb0
[ 349.493553] rdt_get_tree+0x4be/0x770
[ 349.497623] vfs_get_tree+0x2e/0xf0
[ 349.501508] fc_mount+0x18/0x90
[ 349.505007] path_mount+0x360/0xc50
[ 349.508884] ? putname+0x68/0x80
[ 349.512479] __x64_sys_mount+0x124/0x150
[ 349.516848] x64_sys_call+0x2133/0x2190
[ 349.521123] do_syscall_64+0x74/0x970
==================================================================
Thank you for capturing this. This is a different trace but it confirms that it is the
same root cause. Specifically, event is enabled after the state it depends on is (not) allocated
during domain online.
Yes. Thanks
Here is the changelog.
x86,fs/resctrl: Fix BUG with mbm_event mode when MBM events are disabled
The following BUG is encountered when mounting the resctrl filesystem after booting a system with X86_FEATURE_ABMC support and the kernel parameter 'rdt=!mbmtotal,!mbmlocal'.
===========================================================================
[ 349.330416] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 349.338187] #PF: supervisor read access in kernel mode
[ 349.343914] #PF: error_code(0x0000) - not-present page
[ 349.349644] PGD 10419f067 P4D 0
[ 349.353241] Oops: Oops: 0000 [#1] SMP NOPTI
[ 349.357905] CPU: 45 UID: 0 PID: 3449 Comm: mount Not tainted
6.18.0-rc1+ #120 PREEMPT(voluntary)
[ 349.367803] Hardware name: AMD Corporation
[ 349.376334] RIP: 0010:mbm_cntr_get+0x56/0x90
[ 349.381096] Code: 45 8d 41 fe 83 f8 01 77 3d 8b 7b 50 85 ff 7e 36 49 8b 84 24 f0 04 00 00 45 31 c0 eb 0d 41 83 c0 01 48 83 c0 10 44 39 c7 74 1c <48> 3b 50 08 75 ed 3b 08 75 e9 48 83 c4 10 44 89 c0 5b 41 5c 41 5d
[ 349.402037] RSP: 0018:ff56bba58655f958 EFLAGS: 00010246
[ 349.407861] RAX: 0000000000000000 RBX: ffffffff9525b900 RCX: 0000000000000002
[ 349.415818] RDX: ffffffff95d526a0 RSI: ff1f5d52517c1800 RDI: 0000000000000020
[ 349.423774] RBP: ff56bba58655f980 R08: 0000000000000000 R09: 0000000000000001
[ 349.431730] R10: ff1f5d52c616a6f0 R11: fffc6a2f046c3980 R12: ff1f5d52517c1800
[ 349.439687] R13: 0000000000000001 R14: ffffffff95d526a0 R15: ffffffff9525b968
[ 349.447635] FS: 00007f17926b7800(0000) GS:ff1f5d59d45ff000(0000)
knlGS:0000000000000000
[ 349.456659] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 349.463064] CR2: 0000000000000008 CR3: 0000000147afe002 CR4: 0000000000771ef0
[ 349.471022] PKRU: 55555554
[ 349.474033] Call Trace:
[ 349.476755] <TASK>
[ 349.479091] ? kernfs_add_one+0x114/0x170
[ 349.483560] rdtgroup_assign_cntr_event+0x9b/0xd0
[ 349.488795] rdtgroup_assign_cntrs+0xab/0xb0
[ 349.493553] rdt_get_tree+0x4be/0x770
[ 349.497623] vfs_get_tree+0x2e/0xf0
[ 349.501508] fc_mount+0x18/0x90
[ 349.505007] path_mount+0x360/0xc50
[ 349.508884] ? putname+0x68/0x80
[ 349.512479] __x64_sys_mount+0x124/0x150
When mbm_event mode is enabled, it implicitly enables both MBM total and
local events. However, specifying the kernel parameter
"rdt=!mbmtotal,!mbmlocal" disables these events during resctrl initialization. As a result, related data structures, such as rdt_mon_domain::mbm_states, cntr_cfg, and rdt_hw_mon_domain::arch_mbm_states are not allocated. This
leads to a BUG when the user attempts to mount the resctrl filesystem,
which tries to access these un-allocated structures.
Fix the issue by adding a dependency on X86_FEATURE_CQM_MBM_TOTAL and
X86_FEATURE_CQM_MBM_LOCAL for X86_FEATURE_ABMC to be enabled. This is
acceptable for now, as X86_FEATURE_ABMC currently implies support for MBM total and local events. However, this dependency should be revisited and removed in the future to decouple feature handling more cleanly.
Fixes: 13390861b426e ("x86,fs/resctrl: Detect Assignable Bandwidth Monitoring feature details")
Co-developed-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
Signed-off-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
Signed-off-by: Babu Moger <babu.moger@xxxxxxx>
====================================================
thanks
Babu