[Syzkaller & bisect] There is kernel BUG in __jump_label_patch in v6.11-rc1

From: Pengfei Xu
Date: Wed Jul 31 2024 - 22:31:21 EST


Hi Thomas,

Greetings!

There is kernel BUG in __jump_label_patch in v6.11-rc1.
Found the first bad commit is:
83ab38ef0a0b jump_label: Fix concurrency issues in static_key_slow_dec()

All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/240731_164621___jump_label_patch
Syzkaller repro code: https://github.com/xupengfe/syzkaller_logs/blob/main/240731_164621___jump_label_patch/repro.c
Syzkaller syscall repro steps: https://github.com/xupengfe/syzkaller_logs/blob/main/240731_164621___jump_label_patch/repro.prog
Syzkaller analysis report: https://github.com/xupengfe/syzkaller_logs/blob/main/240731_164621___jump_label_patch/repro.report
Kconfig(make olddefconfig): https://github.com/xupengfe/syzkaller_logs/blob/main/240731_164621___jump_label_patch/kconfig_origin
Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/240731_164621___jump_label_patch/bisect_info.log
v6.11-rc1 bzImage: https://github.com/xupengfe/syzkaller_logs/raw/main/240731_164621___jump_label_patch/bzImage_8400291e289ee6b2bf9779ff1c83a291501f017b.tar.gz
Issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/240731_164621___jump_label_patch/8400291e289ee6b2bf9779ff1c83a291501f017b_dmesg.log

"
[ 26.685632] jump_label: Fatal kernel bug, unexpected op at udp_destroy_sock+0xc8/0x280 [00000000ca56fe49] (eb 35 e8 c1 73 != 66 90 0f 1f 00)) size:2 type:1
[ 26.686361] ------------[ cut here ]------------
[ 26.686558] kernel BUG at arch/x86/kernel/jump_label.c:73!
[ 26.686805] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 26.687086] CPU: 1 UID: 0 PID: 2414 Comm: repro Tainted: G W 6.11.0-rc1-8400291e289e #1
[ 26.687477] Tainted: [W]=WARN
[ 26.687610] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 26.688074] RIP: 0010:__jump_label_patch+0x38f/0x400
[ 26.688327] Code: 0b 48 c7 c3 00 69 97 88 e8 7e bb 56 00 45 89 e1 49 89 d8 4c 89 f1 41 55 4c 89 f2 4c 89 f6 48 c7 c7 40 b9 a2 85 e8 31 e8 35 00 <0f> 0b be 04 00 00 00 48 89 45 c8 e8 91 f7 bb 00 48 8b 45 c8 e9 f7
[ 26.689095] RSP: 0018:ffff88800fab79f8 EFLAGS: 00010286
[ 26.689326] RAX: 000000000000008f RBX: ffffffff85a2fb01 RCX: ffffffff814521c6
[ 26.689634] RDX: 0000000000000000 RSI: ffffffff8145d208 RDI: 0000000000000005
[ 26.689942] RBP: ffff88800fab7a40 R08: 0000000000000001 R09: ffffed1001f56ef0
[ 26.690247] R10: 0000000080000000 R11: 0000000000000001 R12: 0000000000000002
[ 26.690544] R13: 0000000000000001 R14: ffffffff85167688 R15: 0000000000000085
[ 26.690837] FS: 0000000000000000(0000) GS:ffff88806c700000(0000) knlGS:0000000000000000
[ 26.691169] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 26.691412] CR2: 00007ffd56dda4d8 CR3: 0000000013efc004 CR4: 0000000000770ef0
[ 26.691706] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 26.692002] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[ 26.692299] PKRU: 55555554
[ 26.692428] Call Trace:
[ 26.692555] <TASK>
[ 26.692662] ? show_regs+0xa8/0xc0
[ 26.692836] ? die+0x42/0xc0
[ 26.692994] ? do_trap+0x230/0x410
[ 26.693161] ? do_error_trap+0xf2/0x210
[ 26.693338] ? __jump_label_patch+0x38f/0x400
[ 26.693533] ? handle_invalid_op+0x39/0x50
[ 26.693714] ? __jump_label_patch+0x38f/0x400
[ 26.693907] ? exc_invalid_op+0x63/0x80
[ 26.694082] ? asm_exc_invalid_op+0x1f/0x30
[ 26.694269] ? udp_destroy_sock+0xc8/0x280
[ 26.694455] ? __wake_up_klogd.part.0+0xa6/0x110
[ 26.694659] ? vprintk+0xd8/0x170
[ 26.694810] ? __jump_label_patch+0x38f/0x400
[ 26.695008] ? __jump_label_patch+0x38f/0x400
[ 26.695203] arch_jump_label_transform_queue+0x80/0x120
[ 26.695432] __jump_label_update+0x13a/0x430
[ 26.695626] jump_label_update+0x34a/0x440
[ 26.695807] ? __pfx_udpv6_destroy_sock+0x10/0x10
[ 26.696014] __static_key_slow_dec_cpuslocked.part.0+0x5f/0xb0
[ 26.696266] static_key_slow_dec+0x86/0xd0
[ 26.696446] udp_encap_disable+0x1e/0x30
[ 26.696621] udpv6_destroy_sock+0x16b/0x250
[ 26.696806] sk_common_release+0x74/0x460
[ 26.696985] udp_lib_close+0x1a/0x30
[ 26.697149] inet_release+0x14c/0x290
[ 26.697315] inet6_release+0x5c/0x80
[ 26.697474] __sock_release+0xb6/0x280
[ 26.697642] ? __pfx_sock_close+0x10/0x10
[ 26.697819] sock_close+0x27/0x40
[ 26.697969] __fput+0x426/0xbc0
[ 26.698117] ____fput+0x1f/0x30
[ 26.698263] task_work_run+0x19c/0x2b0
[ 26.698433] ? __pfx_task_work_run+0x10/0x10
[ 26.698622] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[ 26.698853] ? switch_task_namespaces+0xd8/0x130
[ 26.699058] do_exit+0xafa/0x29f0
[ 26.699210] ? lock_release+0x441/0x870
[ 26.699384] ? __pfx_do_exit+0x10/0x10
[ 26.699552] ? __this_cpu_preempt_check+0x21/0x30
[ 26.699757] ? _raw_spin_unlock_irq+0x2c/0x60
[ 26.699948] ? lockdep_hardirqs_on+0x89/0x110
[ 26.700140] do_group_exit+0xe4/0x2c0
[ 26.700307] __x64_sys_exit_group+0x4d/0x60
[ 26.700491] x64_sys_call+0x20c4/0x20d0
[ 26.700660] do_syscall_64+0x6d/0x140
[ 26.700822] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 26.701044] RIP: 0033:0x7fb60ab18a4d
[ 26.701208] Code: Unable to access opcode bytes at 0x7fb60ab18a23.
[ 26.701472] RSP: 002b:00007ffd56dda568 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 26.701785] RAX: ffffffffffffffda RBX: 00007fb60abf69e0 RCX: 00007fb60ab18a4d
[ 26.702078] RDX: 00000000000000e7 RSI: fffffffffffffeb0 RDI: 0000000000000000
[ 26.702371] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000020
[ 26.702664] R10: 00007ffd56dda410 R11: 0000000000000246 R12: 00007fb60abf69e0
[ 26.702958] R13: 00007fb60abfbf00 R14: 0000000000000001 R15: 00007fb60abfbee8
[ 26.703258] </TASK>
[ 26.703356] Modules linked in:
[ 26.703525] ---[ end trace 0000000000000000 ]---
[ 26.703818] RIP: 0010:__jump_label_patch+0x38f/0x400
[ 26.704070] Code: 0b 48 c7 c3 00 69 97 88 e8 7e bb 56 00 45 89 e1 49 89 d8 4c 89 f1 41 55 4c 89 f2 4c 89 f6 48 c7 c7 40 b9 a2 85 e8 31 e8 35 00 <0f> 0b be 04 00 00 00 48 89 45 c8 e8 91 f7 bb 00 48 8b 45 c8 e9 f7
[ 26.704876] RSP: 0018:ffff88800fab79f8 EFLAGS: 00010286
[ 26.705100] RAX: 000000000000008f RBX: ffffffff85a2fb01 RCX: ffffffff814521c6
[ 26.705399] RDX: 0000000000000000 RSI: ffffffff8145d208 RDI: 0000000000000005
[ 26.705695] RBP: ffff88800fab7a40 R08: 0000000000000001 R09: ffffed1001f56ef0
[ 26.705990] R10: 0000000080000000 R11: 0000000000000001 R12: 0000000000000002
[ 26.706295] R13: 0000000000000001 R14: ffffffff85167688 R15: 0000000000000085
[ 26.706601] FS: 0000000000000000(0000) GS:ffff88806c700000(0000) knlGS:0000000000000000
[ 26.706938] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 26.707181] CR2: 00007ffd56dda4d8 CR3: 0000000013efc004 CR4: 0000000000770ef0
[ 26.707481] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 26.707872] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[ 26.708172] PKRU: 55555554
[ 26.708296] Fixing recursive fault but reboot is needed!
[ 26.708517] BUG: using smp_processor_id() in preemptible [00000000] code: repro/2414
[ 26.708838] caller is debug_smp_processor_id+0x20/0x30
[ 26.709061] CPU: 1 UID: 0 PID: 2414 Comm: repro Tainted: G D W 6.11.0-rc1-8400291e289e #1
[ 26.709451] Tainted: [D]=DIE, [W]=WARN
[ 26.709613] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 26.710075] Call Trace:
[ 26.710198] <TASK>
[ 26.710325] dump_stack_lvl+0x121/0x150
[ 26.710509] dump_stack+0x19/0x20
[ 26.710680] check_preemption_disabled+0x168/0x180
[ 26.710908] debug_smp_processor_id+0x20/0x30
[ 26.711103] __schedule+0x9a/0x2eb0
[ 26.711263] ? rcu_is_watching+0x19/0xc0
[ 26.711439] ? lock_release+0x592/0x870
[ 26.711614] ? __pfx___schedule+0x10/0x10
[ 26.711795] ? debug_smp_processor_id+0x20/0x30
[ 26.712001] ? rcu_is_watching+0x19/0xc0
[ 26.712183] ? trace_irq_enable+0xe1/0x120
[ 26.712373] ? do_task_dead+0x4a/0x110
[ 26.712541] do_task_dead+0xe0/0x110
[ 26.712702] make_task_dead+0x384/0x3c0
[ 26.712876] rewind_stack_and_make_dead+0x16/0x20
[ 26.713082] RIP: 0033:0x7fb60ab18a4d
[ 26.713238] Code: Unable to access opcode bytes at 0x7fb60ab18a23.
[ 26.713495] RSP: 002b:00007ffd56dda568 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 26.713808] RAX: ffffffffffffffda RBX: 00007fb60abf69e0 RCX: 00007fb60ab18a4d
[ 26.714103] RDX: 00000000000000e7 RSI: fffffffffffffeb0 RDI: 0000000000000000
[ 26.714399] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000020
[ 26.714698] R10: 00007ffd56dda410 R11: 0000000000000246 R12: 00007fb60abf69e0
[ 26.715005] R13: 00007fb60abfbf00 R14: 0000000000000001 R15: 00007fb60abfbee8
[ 26.715309] </TASK>
[ 26.715476] BUG: scheduling while atomic: repro/2414/0x00000000
[ 26.715787] INFO: lockdep is turned off.
[ 26.715959] Modules linked in:
[ 26.716107] Preemption disabled at:
[ 26.716110] [<ffffffff81354268>] do_task_dead+0x28/0x110
[ 26.716529] CPU: 1 UID: 0 PID: 2414 Comm: repro Tainted: G D W 6.11.0-rc1-8400291e289e #1
[ 26.716943] Tainted: [D]=DIE, [W]=WARN
[ 26.717108] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 26.717572] Call Trace:
[ 26.717684] <TASK>
[ 26.717986] dump_stack_lvl+0x121/0x150
[ 26.718159] ? do_task_dead+0x28/0x110
[ 26.718328] dump_stack+0x19/0x20
[ 26.718480] __schedule_bug+0x12d/0x180
[ 26.718654] __schedule+0x210c/0x2eb0
[ 26.718821] ? rcu_is_watching+0x19/0xc0
[ 26.718995] ? lock_release+0x592/0x870
[ 26.719167] ? __pfx___schedule+0x10/0x10
[ 26.719350] ? debug_smp_processor_id+0x20/0x30
[ 26.719550] ? rcu_is_watching+0x19/0xc0
[ 26.719723] ? trace_irq_enable+0xe1/0x120
[ 26.719903] ? do_task_dead+0x4a/0x110
[ 26.720068] do_task_dead+0xe0/0x110
[ 26.720227] make_task_dead+0x384/0x3c0
[ 26.720401] rewind_stack_and_make_dead+0x16/0x20
[ 26.720605] RIP: 0033:0x7fb60ab18a4d
[ 26.720761] Code: Unable to access opcode bytes at 0x7fb60ab18a23.
[ 26.721016] RSP: 002b:00007ffd56dda568 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 26.721333] RAX: ffffffffffffffda RBX: 00007fb60abf69e0 RCX: 00007fb60ab18a4d
[ 26.721628] RDX: 00000000000000e7 RSI: fffffffffffffeb0 RDI: 0000000000000000
[ 26.721923] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000020
[ 26.722219] R10: 00007ffd56dda410 R11: 0000000000000246 R12: 00007fb60abf69e0
[ 26.722517] R13: 00007fb60abfbf00 R14: 0000000000000001 R15: 00007fb60abfbee8
[ 26.722818] </TASK>
"

I hope it's helpful.

---

If you don't need the following environment to reproduce the problem or if you
already have one reproduced environment, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
// start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
// You could change the bzImage_xxx as you want
// Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel in vm.


Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install

Best Regards,
Thanks!