Re: [PATCH 2/2] perf/x86/amd: Don't allow pre-emption in amd_pmu_lbr_reset()

From: Mario Limonciello
Date: Tue Oct 24 2023 - 14:31:12 EST


On 10/24/2023 11:51, Ingo Molnar wrote:

* Ingo Molnar <mingo@xxxxxxxxxx> wrote:


* Mario Limonciello <mario.limonciello@xxxxxxx> wrote:

Fixes a BUG reported during suspend to ram testing.

```
[ 478.274752] BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2948
[ 478.274754] caller is amd_pmu_lbr_reset+0x19/0xc0
```

Cc: stable@xxxxxxxxxxxxxxx # 6.1+
Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support")
Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
---
arch/x86/events/amd/lbr.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c
index eb31f850841a..5b98e8c7d8b7 100644
--- a/arch/x86/events/amd/lbr.c
+++ b/arch/x86/events/amd/lbr.c
@@ -321,7 +321,7 @@ int amd_pmu_lbr_hw_config(struct perf_event *event)
void amd_pmu_lbr_reset(void)
{
- struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+ struct cpu_hw_events *cpuc = get_cpu_ptr(&cpu_hw_events);
int i;
if (!x86_pmu.lbr_nr)
@@ -335,6 +335,7 @@ void amd_pmu_lbr_reset(void)
cpuc->last_task_ctx = NULL;
cpuc->last_log_id = 0;
+ put_cpu_ptr(&cpu_hw_events);
wrmsrl(MSR_AMD64_LBR_SELECT, 0);
}

Weird, amd_pmu_lbr_reset() is called from these places:

- amd_pmu_lbr_sched_task(): during task sched-in during
context-switching, this should already have preemption disabled.

- amd_pmu_lbr_add(): this gets indirectly called by amd_pmu::add
(amd_pmu_add_event()), called by event_sched_in(), which too should have
preemption disabled.

I clearly must have missed some additional place it gets called in.

Just for completeness, the additional place I missed is
amd_pmu_cpu_reset():

static_call(amd_pmu_branch_reset)();

... and the amd_pmu_branch_reset static call is set up with
amd_pmu_lbr_reset, which is why git grep missed it.

Anyway, amd_pmu_cpu_reset() is very much something that should run
non-preemptable to begin with, so your patch only papers over the real
problem AFAICS.

Thanks,

Ingo

In that case - should preemption be disabled for all of x86_pmu_dying_cpu() perhaps?

For good measure x86_pmu_starting_cpu() too?