Re: possible deadlock in __perf_event_task_sched_in

From: Marius Fleischer
Date: Mon Apr 29 2024 - 12:39:32 EST


Hi Peter,

Thanks for taking the time to explain this issue!

Wishing you a nice day!

Best,
Marius

On Wed, 24 Apr 2024 at 02:43, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Mon, Apr 22, 2024 at 11:44:27AM -0700, Marius Fleischer wrote:
> > Hi,
> >
> > We would like to report the following bug which has been found by our
> > modified version of syzkaller.
> >
> > We found this report (https://lkml.org/lkml/2021/9/12/333) that seems
> > to have a similar but different stack trace. We are unable to tell,
> > though, whether it is the same cause. We’d be grateful for your
> > advice.
>
> This is just the printk thing sucks again. Some WARN/printk got tripped
> in a non-suitable context.
>
>
> > _printk+0xba/0xed kernel/printk/printk.c:2299
> > ex_handler_msr.cold+0xb7/0x147 arch/x86/mm/extable.c:90
> > fixup_exception+0x973/0xbb0 arch/x86/mm/extable.c:187
> > __exc_general_protection arch/x86/kernel/traps.c:601 [inline]
> > exc_general_protection+0xed/0x2f0 arch/x86/kernel/traps.c:562
> > asm_exc_general_protection+0x22/0x30 arch/x86/include/asm/idtentry.h:562
> > RIP: 0010:__wrmsr arch/x86/include/asm/msr.h:103 [inline]
> > RIP: 0010:native_write_msr arch/x86/include/asm/msr.h:154 [inline]
> > RIP: 0010:wrmsrl arch/x86/include/asm/msr.h:271 [inline]
> > RIP: 0010:__x86_pmu_enable_event
> > arch/x86/events/intel/../perf_event.h:1120 [inline]
> > RIP: 0010:intel_pmu_enable_event+0x2d9/0xff0 arch/x86/events/intel/corec:2694
> > Code: ea 03 49 81 cc 00 00 40 00 4d 21 f4 80 3c 02 00 0f 85 5b 0c 00
> > 00 44 8b ab 70 01 00 00 4c 89 e2 44 89 e0 48 c1 ea 20 44 89 e9 <0f> 30
> > 0f 1f 44 00 00 e8 1b 32 75 00 48 83 c4 20 5b 5d 41 5c 41 5d
> > RSP: 0018:ffffc900115af348 EFLAGS: 00010002
> > RAX: 0000000000530000 RBX: ffff888019dd6a50 RCX: 0000000000000188
> > RDX: 0000000000000002 RSI: ffffffff81029464 RDI: ffff888019dd6bc0
> > RBP: 0000000000000000 R08: 0000000000000001 R09: ffff888063e22ab7
> > R10: 0000000000000000 R11: 0000000000000001 R12: 0000000200530000
> > R13: 0000000000000188 R14: ffffffffffffffff R15: ffff888019dd6bb0
> > x86_pmu_start+0x1cc/0x270 arch/x86/events/core.c:1520
> > x86_pmu_enable+0x481/0xdf0 arch/x86/events/core.c:1337
> > perf_pmu_enable kernel/events/core.c:1243 [inline]
> > perf_pmu_enable kernel/events/core.c:1239 [inline]
>
> Most likely your VM is wonky and perf tries to poke an MSR that either
> doesn't exist or isn't emulated properly, who knows.