Re: [PATCH v7 10/10] perf/x86/rapl: Add core energy counter support for AMD CPUs

From: Peter Jung
Date: Wed Nov 20 2024 - 09:30:37 EST


Hi Dhananjay,

On 20.11.24 14:58, Dhananjay Ugwekar wrote:
Hello Peter Jung,

Thanks for trying out the patchset,

On 11/20/2024 1:28 PM, Peter Jung wrote:
Hi together,

This patch seems to crash the kernel  and results into a not bootable system.


The patch has been applied on base 6.12.rc7 - I have not tested it yet on linux-next.

I was able to reproduce this issue also on the v6 and the only "good" version was the v4.
This has been reproduced on several zen3+ machines and also on my 9950X.

Bisect log:
```
git bisect start
# status: waiting for both good and bad commits
# good: [2d5404caa8c7bb5c4e0435f94b28834ae5456623] Linux 6.12-rc7
git bisect good 2d5404caa8c7bb5c4e0435f94b28834ae5456623
# status: waiting for bad commit, 1 good commit known
# bad: [372e95a40e04ae6ebe69300b76566af6455ba84e] perf/x86/rapl: Add core energy counter support for AMD CPUs
git bisect bad 372e95a40e04ae6ebe69300b76566af6455ba84e
# good: [fd3c84b2fc8a50030e8c7d91983f50539035ec3a] perf/x86/rapl: Rename rapl_pmu variables
git bisect good fd3c84b2fc8a50030e8c7d91983f50539035ec3a
# good: [96673b2c940e71fde50a54311ecdce00ff7a8e0b] perf/x86/rapl: Modify the generic variable names to *_pkg*
git bisect good 96673b2c940e71fde50a54311ecdce00ff7a8e0b
# good: [68b214c92635f0b24a3f3074873b77f4f1a82b80] perf/x86/rapl: Move the cntr_mask to rapl_pmus struct
git bisect good 68b214c92635f0b24a3f3074873b77f4f1a82b80
# first bad commit: [372e95a40e04ae6ebe69300b76566af6455ba84e] perf/x86/rapl: Add core energy counter support for AMD CPUs
```

Nov 17 12:17:37 varvalian kernel: RIP: 0010:internal_create_group+0x9a/0x4e0
Nov 17 12:17:37 varvalian kernel: Code: 7b 20 00 0f 84 cb 00 00 00 48 8d 74 24 1c 48 8d 54 24 18 4c 89 ff e8 15 8a 99 00 48 83 3b 00 74 59 48 8b 43 18 48 85 c0 74 11 <48> 8b 30 48 85 f6 74 09 4c 8b 5b 08 4d 85 db 75 1a 48 8b 43 20 48
Nov 17 12:17:37 varvalian kernel: RSP: 0018:ffffaa5281fe7868 EFLAGS: 00010202
Nov 17 12:17:37 varvalian kernel: RAX: 796772656e650073 RBX: ffffffffc2a642aa RCX: f781ec27a963db00
Nov 17 12:17:37 varvalian kernel: RDX: ffffaa5281fe7880 RSI: ffffaa5281fe7884 RDI: ffff90c611dc8400
Nov 17 12:17:37 varvalian kernel: RBP: 000000000000000f R08: 0000000000000000 R09: 0000000000000001
Nov 17 12:17:37 varvalian kernel: R10: 0000000002000001 R11: ffffffff8e86ee00 R12: 0000000000000000
Nov 17 12:17:37 varvalian kernel: R13: ffff90c6038469c0 R14: ffff90c611dc8400 R15: ffff90c611dc8400
Nov 17 12:17:37 varvalian kernel: FS:  00007163efc54880(0000) GS:ffff90c8efe00000(0000) knlGS:0000000000000000
Nov 17 12:17:37 varvalian kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 17 12:17:37 varvalian kernel: CR2: 00005c1834b98298 CR3: 0000000121298000 CR4: 0000000000f50ef0
Nov 17 12:17:37 varvalian kernel: PKRU: 55555554
Nov 17 12:17:47 varvalian kernel: ------------[ cut here ]------------
```

Ill do on the weekend some additonal tests based on the latest linux-next snapshot and provide some more logs.
Can you please try with the below diff once,

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index e9be1f31163d..d3bb3865c1b1 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -699,6 +699,7 @@ static const struct attribute_group *rapl_attr_update[] = {

static const struct attribute_group *rapl_core_attr_update[] = {
&rapl_events_core_group,
+ NULL,
};

static int __init init_rapl_pmu(struct rapl_pmus *rapl_pmus)

Regards,
Dhananjay



Thanks! This patch appears to fix the issue, when the kernel is built with clang. Thanks for providing such fast fix! :)

Peter


Regards,

Peter