[peterz-queue:sched/hrtick] [entry,hrtimer,x86] c07c4e0c01: BUG:soft_lockup-CPU##stuck_for#s![schbench:#]

From: kernel test robot
Date: Thu Mar 27 2025 - 21:25:22 EST




Hello,

kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![schbench:#]" on:

commit: c07c4e0c013dc11dd466fa63a4af12ef8282b27b ("entry,hrtimer,x86: Push reprogramming timers into the interrupt return path")
https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git sched/hrtick

in testcase: schbench
version: schbench-x86_64-48aed1d-1_20241103
with following parameters:

iterations: 3x
message_threads: 10%
worker_threads: 128
runtime: 300s
cpufreq_governor: performance



config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202503280925.27fefb28-lkp@xxxxxxxxx


[ 120.056174][ C17] watchdog: BUG: soft lockup - CPU#17 stuck for 22s! [schbench:4939]
[ 120.056179][ C17] Modules linked in: kmem intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common device_dax nd_pmem nd_btt dax_pmem i10nm_edac skx_edac_common x86_pkg_temp_thermal intel_powerclamp coretemp btrfs blake2b_generic xor raid6_pq sd_mod kvm_intel sg kvm snd_pcm ast snd_timer dax_hmem ghash_clmulni_intel rapl drm_client_lib ahci cxl_acpi snd ipmi_ssif drm_shmem_helper intel_cstate isst_if_mmio isst_if_mbox_pci acpi_power_meter cxl_port libahci binfmt_misc intel_th_gth cxl_core mei_me soundcore ipmi_si ioatdma i2c_i801 intel_th_pci intel_uncore einj acpi_ipmi pcspkr libata mei isst_if_common drm_kms_helper i2c_smbus intel_pch_thermal intel_vsec intel_th dca wmi nfit ipmi_devintf libnvdimm ipmi_msghandler acpi_pad joydev drm fuse dm_mod loop ip_tables
[ 120.056218][ C17] CPU: 17 UID: 0 PID: 4939 Comm: schbench Tainted: G S 6.14.0-01502-gc07c4e0c013d #1 VOLUNTARY
[ 120.056221][ C17] Tainted: [S]=CPU_OUT_OF_SPEC
[ 120.056222][ C17] Hardware name: Intel Corporation M50CYP2SB1U/M50CYP2SB1U, BIOS SE5C620.86B.01.01.0003.2104260124 04/26/2021
[ 120.056223][ C17] RIP: 0010:native_queued_spin_lock_slowpath (kernel/locking/qspinlock.c:474)
[ 120.056234][ C17] Code: c1 e9 12 83 e0 03 83 e9 01 48 c1 e0 05 48 63 c9 48 05 80 2b e5 83 48 03 04 cd e0 cc bc 82 48 89 10 8b 42 08 85 c0 75 09 f3 90 <8b> 42 08 85 c0 74 f7 48 8b 0a 48 85 c9 74 90 0f 0d 09 eb 91 8b 03
All code
========
0: c1 e9 12 shr $0x12,%ecx
3: 83 e0 03 and $0x3,%eax
6: 83 e9 01 sub $0x1,%ecx
9: 48 c1 e0 05 shl $0x5,%rax
d: 48 63 c9 movslq %ecx,%rcx
10: 48 05 80 2b e5 83 add $0xffffffff83e52b80,%rax
16: 48 03 04 cd e0 cc bc add -0x7d433320(,%rcx,8),%rax
1d: 82
1e: 48 89 10 mov %rdx,(%rax)
21: 8b 42 08 mov 0x8(%rdx),%eax
24: 85 c0 test %eax,%eax
26: 75 09 jne 0x31
28: f3 90 pause
2a:* 8b 42 08 mov 0x8(%rdx),%eax <-- trapping instruction
2d: 85 c0 test %eax,%eax
2f: 74 f7 je 0x28
31: 48 8b 0a mov (%rdx),%rcx
34: 48 85 c9 test %rcx,%rcx
37: 74 90 je 0xffffffffffffffc9
39: 0f 0d 09 prefetchw (%rcx)
3c: eb 91 jmp 0xffffffffffffffcf
3e: 8b 03 mov (%rbx),%eax

Code starting with the faulting instruction
===========================================
0: 8b 42 08 mov 0x8(%rdx),%eax
3: 85 c0 test %eax,%eax
5: 74 f7 je 0xfffffffffffffffe
7: 48 8b 0a mov (%rdx),%rcx
a: 48 85 c9 test %rcx,%rcx
d: 74 90 je 0xffffffffffffff9f
f: 0f 0d 09 prefetchw (%rcx)
12: eb 91 jmp 0xffffffffffffffa5
14: 8b 03 mov (%rbx),%eax
[ 120.056236][ C17] RSP: 0000:ffa00000222dfd68 EFLAGS: 00000246
[ 120.056238][ C17] RAX: 0000000000000000 RBX: ffd40000055f6568 RCX: 000000000000002a
[ 120.056239][ C17] RDX: ff1100103f671b80 RSI: 0000000000ac0101 RDI: ffd40000055f6568
[ 120.056241][ C17] RBP: ff1100103f671b80 R08: 0000000000000000 R09: 0000000000000000
[ 120.056242][ C17] R10: 0000000055555554 R11: ff11000240ff850c R12: 0000000000480000
[ 120.056242][ C17] R13: 0000000000480000 R14: 0200000000000000 R15: 0000000000000000
[ 120.056243][ C17] FS: 00007f75844266c0(0000) GS:ff110010bb81f000(0000) knlGS:0000000000000000
[ 120.056245][ C17] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 120.056246][ C17] CR2: 00007f76e0415c70 CR3: 00000001f83fc002 CR4: 0000000000773ef0
[ 120.056247][ C17] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 120.056247][ C17] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 120.056248][ C17] PKRU: 55555554
[ 120.056249][ C17] Call Trace:
[ 120.056250][ C17] <TASK>
[ 120.056252][ C17] _raw_spin_lock (arch/x86/include/asm/paravirt.h:572 arch/x86/include/asm/qspinlock.h:51 include/asm-generic/qspinlock.h:114 include/linux/spinlock.h:187 include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
[ 120.056254][ C17] do_huge_pmd_numa_page (mm/huge_memory.c:1976)
[ 120.056259][ C17] __handle_mm_fault (mm/memory.c:6014)
[ 120.056264][ C17] handle_mm_fault (mm/memory.c:6197)
[ 120.056266][ C17] do_user_addr_fault (arch/x86/mm/fault.c:1338)
[ 120.056272][ C17] exc_page_fault (arch/x86/include/asm/irqflags.h:37 arch/x86/include/asm/irqflags.h:92 arch/x86/mm/fault.c:1488 arch/x86/mm/fault.c:1538)
[ 120.056275][ C17] asm_exc_page_fault (arch/x86/include/asm/idtentry.h:623)
[ 120.056278][ C17] RIP: 0033:0x55f6cc692d8b
[ 120.056280][ C17] Code: e3 ff ff 8b 05 86 82 00 00 85 c0 0f 84 f7 02 00 00 48 8b 15 b7 33 00 00 31 db 48 85 d2 0f 84 30 01 00 00 4c 8b 15 55 82 00 00 <4d> 8b b7 70 98 10 00 4d 89 d5 4e 8d 1c d5 00 00 00 00 4d 0f af ea
All code
========
0: e3 ff jrcxz 0x1
2: ff 8b 05 86 82 00 decl 0x828605(%rbx)
8: 00 85 c0 0f 84 f7 add %al,-0x87bf040(%rbp)
e: 02 00 add (%rax),%al
10: 00 48 8b add %cl,-0x75(%rax)
13: 15 b7 33 00 00 adc $0x33b7,%eax
18: 31 db xor %ebx,%ebx
1a: 48 85 d2 test %rdx,%rdx
1d: 0f 84 30 01 00 00 je 0x153
23: 4c 8b 15 55 82 00 00 mov 0x8255(%rip),%r10 # 0x827f
2a:* 4d 8b b7 70 98 10 00 mov 0x109870(%r15),%r14 <-- trapping instruction
31: 4d 89 d5 mov %r10,%r13
34: 4e 8d 1c d5 00 00 00 lea 0x0(,%r10,8),%r11
3b: 00
3c: 4d 0f af ea imul %r10,%r13

Code starting with the faulting instruction
===========================================
0: 4d 8b b7 70 98 10 00 mov 0x109870(%r15),%r14
7: 4d 89 d5 mov %r10,%r13
a: 4e 8d 1c d5 00 00 00 lea 0x0(,%r10,8),%r11
11: 00
12: 4d 0f af ea imul %r10,%r13
[ 120.056281][ C17] RSP: 002b:00007f7584425df0 EFLAGS: 00010206
[ 120.056282][ C17] RAX: 0000000000000000 RBX: 000055f6e61615d0 RCX: 0000000000000000
[ 120.056283][ C17] RDX: 0000000000000005 RSI: 0000000000000000 RDI: 000055f6e61615d0
[ 120.056284][ C17] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 120.056284][ C17] R10: 0000000000000068 R11: 0000000000000293 R12: 00007f76e0315c70
[ 120.056285][ C17] R13: 0000000000000011 R14: 00007f76e030c420 R15: 00007f76e030c400
[ 120.056287][ C17] </TASK>
[ 120.056288][ C17] Kernel panic - not syncing: softlockup: hung tasks
[ 120.410327][ C17] CPU: 17 UID: 0 PID: 4939 Comm: schbench Tainted: G S L 6.14.0-01502-gc07c4e0c013d #1 VOLUNTARY
[ 120.422640][ C17] Tainted: [S]=CPU_OUT_OF_SPEC, [L]=SOFTLOCKUP
[ 120.428974][ C17] Hardware name: Intel Corporation M50CYP2SB1U/M50CYP2SB1U, BIOS SE5C620.86B.01.01.0003.2104260124 04/26/2021
[ 120.441111][ C17] Call Trace:
[ 120.444577][ C17] <IRQ>
[ 120.447593][ C17] panic (kernel/panic.c:354)
[ 120.451654][ C17] watchdog_timer_fn (kernel/watchdog.c:733)
[ 120.456739][ C17] ? __pfx_watchdog_timer_fn (kernel/watchdog.c:683)
[ 120.462344][ C17] __hrtimer_run_queues (kernel/time/hrtimer.c:1799 kernel/time/hrtimer.c:1863)
[ 120.467684][ C17] hrtimer_interrupt (kernel/time/hrtimer.c:1960)
[ 120.472753][ C17] __sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1038 arch/x86/kernel/apic/apic.c:1055)
[ 120.478688][ C17] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049 arch/x86/kernel/apic/apic.c:1049)
[ 120.484437][ C17] </IRQ>
[ 120.487494][ C17] <TASK>
[ 120.490535][ C17] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:702)
[ 120.496622][ C17] RIP: 0010:native_queued_spin_lock_slowpath (kernel/locking/qspinlock.c:474)
[ 120.503754][ C17] Code: c1 e9 12 83 e0 03 83 e9 01 48 c1 e0 05 48 63 c9 48 05 80 2b e5 83 48 03 04 cd e0 cc bc 82 48 89 10 8b 42 08 85 c0 75 09 f3 90 <8b> 42 08 85 c0 74 f7 48 8b 0a 48 85 c9 74 90 0f 0d 09 eb 91 8b 03
All code
========
0: c1 e9 12 shr $0x12,%ecx
3: 83 e0 03 and $0x3,%eax
6: 83 e9 01 sub $0x1,%ecx
9: 48 c1 e0 05 shl $0x5,%rax
d: 48 63 c9 movslq %ecx,%rcx
10: 48 05 80 2b e5 83 add $0xffffffff83e52b80,%rax
16: 48 03 04 cd e0 cc bc add -0x7d433320(,%rcx,8),%rax
1d: 82
1e: 48 89 10 mov %rdx,(%rax)
21: 8b 42 08 mov 0x8(%rdx),%eax
24: 85 c0 test %eax,%eax
26: 75 09 jne 0x31
28: f3 90 pause
2a:* 8b 42 08 mov 0x8(%rdx),%eax <-- trapping instruction
2d: 85 c0 test %eax,%eax
2f: 74 f7 je 0x28
31: 48 8b 0a mov (%rdx),%rcx
34: 48 85 c9 test %rcx,%rcx
37: 74 90 je 0xffffffffffffffc9
39: 0f 0d 09 prefetchw (%rcx)
3c: eb 91 jmp 0xffffffffffffffcf
3e: 8b 03 mov (%rbx),%eax

Code starting with the faulting instruction
===========================================
0: 8b 42 08 mov 0x8(%rdx),%eax
3: 85 c0 test %eax,%eax
5: 74 f7 je 0xfffffffffffffffe
7: 48 8b 0a mov (%rdx),%rcx
a: 48 85 c9 test %rcx,%rcx
d: 74 90 je 0xffffffffffffff9f
f: 0f 0d 09 prefetchw (%rcx)
12: eb 91 jmp 0xffffffffffffffa5
14: 8b 03 mov (%rbx),%eax


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250328/202503280925.27fefb28-lkp@xxxxxxxxx



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki