Re: [PATCH] clocksource: disable irq when holding watchdog_lock.

From: kernel test robot
Date: Wed Oct 25 2023 - 22:33:36 EST




Hello,

kernel test robot noticed "WARNING:at_kernel/locking/irqflag-debug.c:#warn_bogus_irq_restore" on:

commit: e0e3e69931bce73960e6583cd6385ebbd77fa12f ("[PATCH] clocksource: disable irq when holding watchdog_lock.")
url: https://github.com/intel-lab-lkp/linux/commits/Tetsuo-Handa/clocksource-disable-irq-when-holding-watchdog_lock/20231017-143723
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 8ceea12d183cf29f28072dede218a04eda2a789c
patch link: https://lore.kernel.org/all/80ff5036-8449-44a6-ba2f-0130d3be6b57@xxxxxxxxxxxxxxxxxxx/
patch subject: [PATCH] clocksource: disable irq when holding watchdog_lock.

in testcase: boot

compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+-------------------------------------------------------------------+------------+------------+
| | 8ceea12d18 | e0e3e69931 |
+-------------------------------------------------------------------+------------+------------+
| WARNING:at_kernel/locking/irqflag-debug.c:#warn_bogus_irq_restore | 0 | 13 |
| RIP:warn_bogus_irq_restore | 0 | 13 |
+-------------------------------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202310261029.ea8e81e8-oliver.sang@xxxxxxxxx


[ 33.794687][ C0] ------------[ cut here ]------------
[ 33.795935][ C0] raw_local_irq_restore() called with IRQs enabled
[ 33.797202][ C0] WARNING: CPU: 0 PID: 10 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore (kernel/locking/irqflag-debug.c:10 (discriminator 3))
[ 33.798562][ C0] Modules linked in:
[ 33.799208][ C0] CPU: 0 PID: 10 Comm: kworker/u4:0 Not tainted 6.6.0-rc1-00009-ge0e3e69931bc #1 1a5570c26ece064894076fe3d70a31d4a0268b39
[ 33.801328][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 33.802561][ C0] Workqueue: events_unbound padata_mt_helper
[ 33.803751][ C0] RIP: 0010:warn_bogus_irq_restore (kernel/locking/irqflag-debug.c:10 (discriminator 3))
[ 33.804801][ C0] Code: 0f 1f 44 00 00 f3 0f 1e fa 90 80 3d 92 d0 f9 00 00 74 02 90 c3 c6 05 87 d0 f9 00 01 90 48 c7 c7 68 ad 59 82 e8 0c a4 29 ff 90 <0f> 0b 90 90 eb e3 0f 1f 44 00 00 f3 0f 1e fa 44 8b 05 39 1a fc 00
All code
========
0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
5: f3 0f 1e fa endbr64
9: 90 nop
a: 80 3d 92 d0 f9 00 00 cmpb $0x0,0xf9d092(%rip) # 0xf9d0a3
11: 74 02 je 0x15
13: 90 nop
14: c3 ret
15: c6 05 87 d0 f9 00 01 movb $0x1,0xf9d087(%rip) # 0xf9d0a3
1c: 90 nop
1d: 48 c7 c7 68 ad 59 82 mov $0xffffffff8259ad68,%rdi
24: e8 0c a4 29 ff call 0xffffffffff29a435
29: 90 nop
2a:* 0f 0b ud2 <-- trapping instruction
2c: 90 nop
2d: 90 nop
2e: eb e3 jmp 0x13
30: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
35: f3 0f 1e fa endbr64
39: 44 8b 05 39 1a fc 00 mov 0xfc1a39(%rip),%r8d # 0xfc1a79

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 90 nop
3: 90 nop
4: eb e3 jmp 0xffffffffffffffe9
6: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
b: f3 0f 1e fa endbr64
f: 44 8b 05 39 1a fc 00 mov 0xfc1a39(%rip),%r8d # 0xfc1a4f
[ 33.806560][ C0] RSP: 0000:ffff88842fa05e40 EFLAGS: 00010286
[ 33.807965][ C0] RAX: 0000000000000000 RBX: ffffffff82c290c0 RCX: 0000000000000100
[ 33.809247][ C0] RDX: ffff888100578000 RSI: ffffffff810c1d4f RDI: 0000000000000001
[ 33.810559][ C0] RBP: 0000000000000246 R08: 0000000000000000 R09: 0000000000000001
[ 33.812116][ C0] R10: 0000000000000001 R11: ffffffff82c10bf8 R12: ffffffff811c03c0
[ 33.813603][ C0] R13: 00000000fffedb90 R14: 00000000fffedb90 R15: ffff88842fa05f08
[ 33.814564][ C0] FS: 0000000000000000(0000) GS:ffff88842fa00000(0000) knlGS:0000000000000000
[ 33.816171][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 33.817290][ C0] CR2: ffff88843ffff000 CR3: 0000000002853000 CR4: 00000000000406f0
[ 33.818566][ C0] Call Trace:
[ 33.819066][ C0] <IRQ>
[ 33.819497][ C0] ? show_regs (arch/x86/kernel/dumpstack.c:479)
[ 33.820147][ C0] ? __warn (kernel/panic.c:673)
[ 33.820756][ C0] ? warn_bogus_irq_restore (kernel/locking/irqflag-debug.c:10 (discriminator 3))
[ 33.821557][ C0] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 33.822432][ C0] ? handle_bug (arch/x86/kernel/traps.c:237)
[ 33.822560][ C0] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
[ 33.823248][ C0] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568)
[ 33.823984][ C0] ? clocksource_watchdog_kthread (kernel/time/clocksource.c:400)
[ 33.824914][ C0] ? __warn_printk (kernel/panic.c:722)
[ 33.825736][ C0] ? warn_bogus_irq_restore (kernel/locking/irqflag-debug.c:10 (discriminator 3))
[ 33.826566][ C0] _raw_spin_unlock_irqrestore (arch/x86/include/asm/irqflags.h:134 include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
[ 33.827489][ C0] call_timer_fn (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/timer.h:127 kernel/time/timer.c:1701)
[ 33.828252][ C0] ? clocksource_watchdog_kthread (kernel/time/clocksource.c:400)
[ 33.829277][ C0] __run_timers (kernel/time/timer.c:1752 kernel/time/timer.c:2022)
[ 33.830126][ C0] __do_softirq (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/irq.h:142 kernel/softirq.c:554)
[ 33.830563][ C0] ? sched_clock_cpu (kernel/sched/clock.c:270 kernel/sched/clock.c:405)
[ 33.831364][ C0] irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632 kernel/softirq.c:644)
[ 33.832124][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1074 (discriminator 14))
[ 33.833107][ C0] </IRQ>
[ 33.833608][ C0] <TASK>
[ 33.834109][ C0] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:645)
[ 33.834562][ C0] RIP: 0010:clear_page_rep (arch/x86/lib/clear_page_64.S:20)
[ 33.835452][ C0] Code: 5b 08 48 83 c7 18 e8 24 ab fe ff 48 89 5d 20 48 85 db 75 e2 5b 5d e9 34 e8 3f ff 0f 1f 40 00 f3 0f 1e fa b9 00 02 00 00 31 c0 <f3> 48 ab c3 90 f3 0f 1e fa 31 c0 b9 40 00 00 00 0f 1f 44 00 00 ff
All code
========
0: 5b pop %rbx
1: 08 48 83 or %cl,-0x7d(%rax)
4: c7 (bad)
5: 18 e8 sbb %ch,%al
7: 24 ab and $0xab,%al
9: fe (bad)
a: ff 48 89 decl -0x77(%rax)
d: 5d pop %rbp
e: 20 48 85 and %cl,-0x7b(%rax)
11: db 75 e2 (bad) -0x1e(%rbp)
14: 5b pop %rbx
15: 5d pop %rbp
16: e9 34 e8 3f ff jmp 0xffffffffff3fe84f
1b: 0f 1f 40 00 nopl 0x0(%rax)
1f: f3 0f 1e fa endbr64
23: b9 00 02 00 00 mov $0x200,%ecx
28: 31 c0 xor %eax,%eax
2a:* f3 48 ab rep stos %rax,%es:(%rdi) <-- trapping instruction
2d: c3 ret
2e: 90 nop
2f: f3 0f 1e fa endbr64
33: 31 c0 xor %eax,%eax
35: b9 40 00 00 00 mov $0x40,%ecx
3a: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
3f: ff .byte 0xff

Code starting with the faulting instruction
===========================================
0: f3 48 ab rep stos %rax,%es:(%rdi)
3: c3 ret
4: 90 nop
5: f3 0f 1e fa endbr64
9: 31 c0 xor %eax,%eax
b: b9 40 00 00 00 mov $0x40,%ecx
10: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
15: ff .byte 0xff
[ 33.838561][ C0] RSP: 0000:ffff88810057fc98 EFLAGS: 00010246
[ 33.839599][ C0] RAX: 0000000000000000 RBX: ffffea0005140000 RCX: 0000000000000200
[ 33.840946][ C0] RDX: 0000000000000200 RSI: ffffea0005148000 RDI: ffff888145200000
[ 33.842280][ C0] RBP: 0000000000000700 R08: 0000000000000001 R09: 0000000000000001
[ 33.842562][ C0] R10: 0000000000000001 R11: ffffffff84799a30 R12: 0000000000145000
[ 33.843895][ C0] R13: 000000000000000a R14: 0000000000000001 R15: 0000000000000400
[ 33.845247][ C0] __free_pages_ok (mm/page_alloc.c:1070 mm/page_alloc.c:1164 mm/page_alloc.c:1266)
[ 33.846083][ C0] deferred_free_range (mm/mm_init.c:1990)
[ 33.846565][ C0] deferred_init_maxorder (mm/mm_init.c:2145)
[ 33.847517][ C0] deferred_init_memmap_chunk (mm/mm_init.c:2168)
[ 33.848497][ C0] padata_mt_helper (include/linux/spinlock.h:351 kernel/padata.c:448)
[ 33.849347][ C0] process_one_work (kernel/workqueue.c:2635)
[ 33.850240][ C0] worker_thread (kernel/workqueue.c:2697 kernel/workqueue.c:2784)
[ 33.850569][ C0] ? process_one_work (kernel/workqueue.c:2730)
[ 33.851504][ C0] kthread (kernel/kthread.c:388)
[ 33.852237][ C0] ? kthread_complete_and_exit (kernel/kthread.c:341)
[ 33.853263][ C0] ret_from_fork (arch/x86/kernel/process.c:153)
[ 33.854093][ C0] ? kthread_complete_and_exit (kernel/kthread.c:341)
[ 33.854566][ C0] ret_from_fork_asm (arch/x86/entry/entry_64.S:312)
[ 33.855469][ C0] </TASK>
[ 33.856020][ C0] irq event stamp: 110440
[ 33.856786][ C0] hardirqs last enabled at (110448): console_unlock (arch/x86/include/asm/irqflags.h:26 arch/x86/include/asm/irqflags.h:67 arch/x86/include/asm/irqflags.h:127 kernel/printk/printk.c:347 kernel/printk/printk.c:2718 kernel/printk/printk.c:3037)
[ 33.858488][ C0] hardirqs last disabled at (110455): console_unlock (kernel/printk/printk.c:345 kernel/printk/printk.c:2718 kernel/printk/printk.c:3037)
[ 33.858561][ C0] softirqs last enabled at (105760): __do_softirq (arch/x86/include/asm/preempt.h:27 kernel/softirq.c:400 kernel/softirq.c:582)
[ 33.860220][ C0] softirqs last disabled at (109873): irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632 kernel/softirq.c:644)
[ 33.861864][ C0] ---[ end trace 0000000000000000 ]---
[ 43.589614][ T12] rcu_tasks_wait_gp: rcu_tasks grace period number 5 (since boot) is 2542 jiffies old.
[ 47.973598][ T26] node 0 deferred pages initialised in 14620ms
[ 47.974563][ T26] pgdatinit0 (26) used greatest stack depth: 14104 bytes left
[ 48.006348][ T1] allocated 268435456 bytes of page_ext
[ 48.007874][ T1] Node 0, zone DMA32: page owner found early allocated 9 pages
[ 48.013919][ T12] Callback from call_rcu_tasks() invoked.
[ 48.059811][ T1] Node 0, zone Normal: page owner found early allocated 67566 pages
[ 48.063845][ T1] devtmpfs: initialized
[ 48.063942][ T1] x86/mm: Memory block size: 128MB
[ 48.105976][ T1] Running RCU synchronous self tests
[ 48.106615][ T1] Running RCU synchronous self tests
[ 48.106615][ T1] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[ 48.106615][ T1] futex hash table entries: 512 (order: 4, 65536 bytes, linear)
[ 48.110571][ T1] prandom: seed boundary self test passed
[ 48.115313][ T1] prandom: 100 self tests passed


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231026/202310261029.ea8e81e8-oliver.sang@xxxxxxxxx



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki