[qais-yousef:generalized-misfit-lb] [sched/fair] 64f5eb4cce: BUG:kernel_NULL_pointer_dereference,address

From: kernel test robot
Date: Wed Sep 06 2023 - 03:04:09 EST




Hello,

kernel test robot noticed "BUG:kernel_NULL_pointer_dereference,address" on:

commit: 64f5eb4cce71383f75074bf14ef47668098d5218 ("sched/fair: Implement new type of misfit MISFIT_POWER")
https://github.com/qais-yousef/linux generalized-misfit-lb

in testcase: boot

compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+---------------------------------------------+------------+------------+
| | 7023f08515 | 64f5eb4cce |
+---------------------------------------------+------------+------------+
| boot_successes | 24 | 0 |
| boot_failures | 0 | 24 |
| BUG:kernel_NULL_pointer_dereference,address | 0 | 24 |
| Oops:#[##] | 0 | 24 |
| RIP:load_balance | 0 | 24 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 24 |
+---------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202309061448.29151a04-oliver.sang@xxxxxxxxx


[ 0.567014][ T2] BUG: kernel NULL pointer dereference, address: 0000000000000d10
[ 0.567066][ T2] #PF: supervisor read access in kernel mode
[ 0.567066][ T2] #PF: error_code(0x0000) - not-present page
[ 0.567066][ T2] PGD 0 P4D 0
[ 0.567066][ T2] Oops: 0000 [#1] SMP PTI
[ 0.567066][ T2] CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.5.0-rc2-00048-g64f5eb4cce71 #1
[ 0.567066][ T2] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 0.567066][ T2] RIP: 0010:load_balance (kernel/sched/fair.c:11002 kernel/sched/fair.c:11077)
[ 0.567066][ T2] Code: 83 c4 02 00 00 48 c7 c0 c0 d9 02 00 48 89 1c 24 48 8b 5c 24 18 49 89 f8 48 89 44 24 10 4c 89 6c 24 20 45 89 fd 48 8b 44 24 60 <83> b8 10 0d 00 00 01 0f 84 20 03 00 00 31 d2 83 bc 24 84 00 00 00
All code
========
0: 83 c4 02 add $0x2,%esp
3: 00 00 add %al,(%rax)
5: 48 c7 c0 c0 d9 02 00 mov $0x2d9c0,%rax
c: 48 89 1c 24 mov %rbx,(%rsp)
10: 48 8b 5c 24 18 mov 0x18(%rsp),%rbx
15: 49 89 f8 mov %rdi,%r8
18: 48 89 44 24 10 mov %rax,0x10(%rsp)
1d: 4c 89 6c 24 20 mov %r13,0x20(%rsp)
22: 45 89 fd mov %r15d,%r13d
25: 48 8b 44 24 60 mov 0x60(%rsp),%rax
2a:* 83 b8 10 0d 00 00 01 cmpl $0x1,0xd10(%rax) <-- trapping instruction
31: 0f 84 20 03 00 00 je 0x357
37: 31 d2 xor %edx,%edx
39: 83 .byte 0x83
3a: bc 24 84 00 00 mov $0x8424,%esp
...

Code starting with the faulting instruction
===========================================
0: 83 b8 10 0d 00 00 01 cmpl $0x1,0xd10(%rax)
7: 0f 84 20 03 00 00 je 0x32d
d: 31 d2 xor %edx,%edx
f: 83 .byte 0x83
10: bc 24 84 00 00 mov $0x8424,%esp
...
[ 0.567066][ T2] RSP: 0000:ffffaeabc001bc78 EFLAGS: 00010007
[ 0.567066][ T2] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000fffffffe
[ 0.567066][ T2] RDX: 0000000000000000 RSI: ffff892280221f28 RDI: ffff8922803a9bc0
[ 0.567066][ T2] RBP: ffffaeabc001bd68 R08: ffff8922803a9bc0 R09: ffff892280221f28
[ 0.567066][ T2] R10: 00000000eac0c6e6 R11: 0000000000000000 R12: 0000000000000000
[ 0.567066][ T2] R13: 0000000000000000 R14: ffffaeabc001bd28 R15: 0000000000000000
[ 0.567066][ T2] FS: 0000000000000000(0000) GS:ffff8925afc00000(0000) knlGS:0000000000000000
[ 0.567066][ T2] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.567066][ T2] CR2: 0000000000000d10 CR3: 00000000a7e18000 CR4: 00000000000406f0
[ 0.567066][ T2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.567066][ T2] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.567066][ T2] Call Trace:
[ 0.567066][ T2] <TASK>
[ 0.567066][ T2] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
[ 0.567066][ T2] ? page_fault_oops (arch/x86/mm/fault.c:707)
[ 0.567066][ T2] ? exc_page_fault (arch/x86/include/asm/irqflags.h:37 arch/x86/include/asm/irqflags.h:72 arch/x86/mm/fault.c:1494 arch/x86/mm/fault.c:1542)
[ 0.567066][ T2] ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:570)
[ 0.567066][ T2] ? load_balance (kernel/sched/fair.c:11002 kernel/sched/fair.c:11077)
[ 0.567066][ T2] ? load_balance (arch/x86/include/asm/jump_label.h:27 kernel/sched/fair.c:11074)
[ 0.567066][ T2] newidle_balance (kernel/sched/fair.c:12146)
[ 0.567066][ T2] pick_next_task_fair (kernel/sched/fair.c:8274)
[ 0.567066][ T2] __schedule (kernel/sched/core.c:6005 kernel/sched/core.c:6515 kernel/sched/core.c:6660)
[ 0.567066][ T2] schedule (arch/x86/include/asm/bitops.h:207 (discriminator 1) arch/x86/include/asm/bitops.h:239 (discriminator 1) include/linux/thread_info.h:184 (discriminator 1) include/linux/sched.h:2245 (discriminator 1) kernel/sched/core.c:6774 (discriminator 1))
[ 0.567066][ T2] kthreadd (kernel/kthread.c:735)
[ 0.567066][ T2] ? __pfx_kthreadd (kernel/kthread.c:720)
[ 0.567066][ T2] ret_from_fork (arch/x86/kernel/process.c:151)
[ 0.567066][ T2] ? __pfx_kthreadd (kernel/kthread.c:720)
[ 0.567066][ T2] ret_from_fork_asm (arch/x86/entry/entry_64.S:298)
[ 0.567066][ T2] RIP: 0000:0x0
[ 0.567066][ T2] Code: Unable to access opcode bytes at 0xffffffffffffffd6.

Code starting with the faulting instruction
===========================================
[ 0.567066][ T2] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
[ 0.567066][ T2] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 0.567066][ T2] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 0.567066][ T2] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 0.567066][ T2] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 0.567066][ T2] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 0.567066][ T2] </TASK>
[ 0.567066][ T2] Modules linked in:
[ 0.567066][ T2] CR2: 0000000000000d10
[ 0.567066][ T2] ---[ end trace 0000000000000000 ]---
[ 0.567066][ T2] RIP: 0010:load_balance (kernel/sched/fair.c:11002 kernel/sched/fair.c:11077)
[ 0.567066][ T2] Code: 83 c4 02 00 00 48 c7 c0 c0 d9 02 00 48 89 1c 24 48 8b 5c 24 18 49 89 f8 48 89 44 24 10 4c 89 6c 24 20 45 89 fd 48 8b 44 24 60 <83> b8 10 0d 00 00 01 0f 84 20 03 00 00 31 d2 83 bc 24 84 00 00 00
All code
========
0: 83 c4 02 add $0x2,%esp
3: 00 00 add %al,(%rax)
5: 48 c7 c0 c0 d9 02 00 mov $0x2d9c0,%rax
c: 48 89 1c 24 mov %rbx,(%rsp)
10: 48 8b 5c 24 18 mov 0x18(%rsp),%rbx
15: 49 89 f8 mov %rdi,%r8
18: 48 89 44 24 10 mov %rax,0x10(%rsp)
1d: 4c 89 6c 24 20 mov %r13,0x20(%rsp)
22: 45 89 fd mov %r15d,%r13d
25: 48 8b 44 24 60 mov 0x60(%rsp),%rax
2a:* 83 b8 10 0d 00 00 01 cmpl $0x1,0xd10(%rax) <-- trapping instruction
31: 0f 84 20 03 00 00 je 0x357
37: 31 d2 xor %edx,%edx
39: 83 .byte 0x83
3a: bc 24 84 00 00 mov $0x8424,%esp
...

Code starting with the faulting instruction
===========================================
0: 83 b8 10 0d 00 00 01 cmpl $0x1,0xd10(%rax)
7: 0f 84 20 03 00 00 je 0x32d
d: 31 d2 xor %edx,%edx
f: 83 .byte 0x83
10: bc 24 84 00 00 mov $0x8424,%esp


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230906/202309061448.29151a04-oliver.sang@xxxxxxxxx



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki