Re: [PATCH] [PATCH v2] perf/x86/amd: check event before enable to avoid GPF
From: Ravi Bangoria
Date: Mon Oct 07 2024 - 02:44:42 EST
On 01-Oct-24 1:22 AM, George Kennedy wrote:
> On AMD machines cpuc->events[idx] can become NULL in a subtle race
> condition with NMI->throttle->x86_pmu_stop().
>
> Check event for NULL in amd_pmu_enable_all() before enable to avoid a GPF.
> This appears to be an AMD only issue.
>
> Syzkaller reported a GPF in amd_pmu_enable_all.
>
> INFO: NMI handler (perf_event_nmi_handler) took too long to run: 13.143
> msecs
> Oops: general protection fault, probably for non-canonical address
> 0xdffffc0000000034: 0000 PREEMPT SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x00000000000001a0-0x00000000000001a7]
> CPU: 0 UID: 0 PID: 328415 Comm: repro_36674776 Not tainted 6.12.0-rc1-syzk
> RIP: 0010:x86_pmu_enable_event (arch/x86/events/perf_event.h:1195
> arch/x86/events/core.c:1430)
> RSP: 0018:ffff888118009d60 EFLAGS: 00010012
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000034 RSI: 0000000000000000 RDI: 00000000000001a0
> RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
> R13: ffff88811802a440 R14: ffff88811802a240 R15: ffff8881132d8601
> FS: 00007f097dfaa700(0000) GS:ffff888118000000(0000) GS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000200001c0 CR3: 0000000103d56000 CR4: 00000000000006f0
> Call Trace:
> <IRQ>
> amd_pmu_enable_all (arch/x86/events/amd/core.c:760 (discriminator 2))
> x86_pmu_enable (arch/x86/events/core.c:1360)
> event_sched_out (kernel/events/core.c:1191 kernel/events/core.c:1186
> kernel/events/core.c:2346)
> __perf_remove_from_context (kernel/events/core.c:2435)
> event_function (kernel/events/core.c:259)
> remote_function (kernel/events/core.c:92 (discriminator 1)
> kernel/events/core.c:72 (discriminator 1))
> __flush_smp_call_function_queue (./arch/x86/include/asm/jump_label.h:27
> ./include/linux/jump_label.h:207 ./include/trace/events/csd.h:64
> kernel/smp.c:135 kernel/smp.c:540)
> __sysvec_call_function_single (./arch/x86/include/asm/jump_label.h:27
> ./include/linux/jump_label.h:207
> ./arch/x86/include/asm/trace/irq_vectors.h:99 arch/x86/kernel/smp.c:272)
> sysvec_call_function_single (arch/x86/kernel/smp.c:266 (discriminator 47)
> arch/x86/kernel/smp.c:266 (discriminator 47))
> </IRQ>
>
> Reported-by: syzkaller <syzkaller@xxxxxxxxxxxxxxxx>
> Signed-off-by: George Kennedy <george.kennedy@xxxxxxxxxx>
Peter, Ingo,
I've been blocking this for quite some time without acting on it. Since
the patch is trivial and harmless, I'm giving an Ack here. However, the
underlying race condition is still unknown, so I will get back to it.
Acked-by: Ravi Bangoria <ravi.bangoria@xxxxxxx>
> ---
> Ravi requested patch resend with the following:
> /*
> * FIXME: cpuc->events[idx] can become NULL in a subtle race
> * condition with NMI->throttle->x86_pmu_stop().
> */
George, the comment should be inside the code.
>
> arch/x86/events/amd/core.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
> index 920e3a640cad..3a8b8db878f6 100644
> --- a/arch/x86/events/amd/core.c
> +++ b/arch/x86/events/amd/core.c
> @@ -762,7 +762,8 @@ static void amd_pmu_enable_all(int added)
> if (!test_bit(idx, cpuc->active_mask))
> continue;
>
> - amd_pmu_enable_event(cpuc->events[idx]);
> + if (cpuc->events[idx])
> + amd_pmu_enable_event(cpuc->events[idx]);
> }
> }
>
Thanks,
Ravi