[Syzkaller & bisect] There is WARNING: refcount bug in sock_map_free in v6.3-rc1

From: Pengfei Xu
Date: Tue Mar 07 2023 - 09:41:48 EST


Hi Eric Dumazet,

Greeting!

Platform: x86 platform
All detailed log link: https://github.com/xupengfe/syzkaller_logs/tree/main/230306_123510_sock_map_free
v6.3-rc1 problem dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230306_123510_sock_map_free/v6.3-rc1_problem_dmesg.log
Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230306_123510_sock_map_free/bisect_info.log
Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230306_123510_sock_map_free/bisect_info.log

There is WARNING: refcount bug in sock_map_free in v6.3-rc1 kernel.


[ 58.955232] ------------[ cut here ]------------
[ 58.955976] refcount_t: addition on 0; use-after-free.
[ 58.956752] WARNING: CPU: 0 PID: 9 at lib/refcount.c:25 refcount_warn_saturate+0xe6/0x1c0
[ 58.957846] Modules linked in:
[ 58.958278] CPU: 0 PID: 9 Comm: kworker/u4:0 Not tainted 6.3.0-rc1-fe15c26ee26e+ #1
[ 58.959340] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 58.960854] Workqueue: events_unbound bpf_map_free_deferred
[ 58.961630] RIP: 0010:refcount_warn_saturate+0xe6/0x1c0
[ 58.962344] Code: 1d b8 c2 43 02 31 ff 89 de e8 c6 ab 52 ff 84 db 75 97 e8 5d aa 52 ff 48 c7 c7 d8 b6 be 83 c6 05 98 c2 43 02 01 e8 1a 55 36 ff <0f> 0b e9 78 9
[ 58.964836] RSP: 0018:ffffc90000053da0 EFLAGS: 00010286
[ 58.965582] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8111b6fb
[ 58.966570] RDX: 0000000000000000 RSI: ffff888007372340 RDI: 0000000000000002
[ 58.967569] RBP: ffffc90000053db0 R08: 0000000000000000 R09: 0000000000000001
[ 58.968547] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88800ec16200
[ 58.969532] R13: 0000000000000000 R14: ffff888015808000 R15: ffff88800ed6fc00
[ 58.970500] FS: 0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[ 58.971647] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 58.972447] CR2: 0000000020000000 CR3: 0000000003e3a006 CR4: 0000000000770ef0
[ 58.973461] PKRU: 55555554
[ 58.973851] Call Trace:
[ 58.974206] <TASK>
[ 58.974520] sock_map_free+0x387/0x390
[ 58.975077] bpf_map_free_deferred+0x124/0x2d0
[ 58.975753] process_one_work+0x3b1/0x9e0
[ 58.976350] worker_thread+0x52/0x660
[ 58.976895] ? __pfx_worker_thread+0x10/0x10
[ 58.977514] kthread+0x161/0x1a0
[ 58.977994] ? __pfx_kthread+0x10/0x10
[ 58.978529] ret_from_fork+0x29/0x50
[ 58.979073] </TASK>
[ 58.979448] irq event stamp: 25237
[ 58.979940] hardirqs last enabled at (25245): [<ffffffff811eac21>] __up_console_sem+0x91/0xb0
[ 58.981151] hardirqs last disabled at (25252): [<ffffffff811eac06>] __up_console_sem+0x76/0xb0
[ 58.982466] softirqs last enabled at (25234): [<ffffffff83023a3c>] __do_softirq+0x31c/0x49c
[ 58.983769] softirqs last disabled at (25229): [<ffffffff8112b5a4>] irq_exit_rcu+0xc4/0x100
[ 58.985055] ---[ end trace 0000000000000000 ]---
[ 58.985813] ------------[ cut here ]------------
[ 58.986530] refcount_t: underflow; use-after-free.
[ 58.987460] WARNING: CPU: 0 PID: 9 at lib/refcount.c:28 refcount_warn_saturate+0x15e/0x1c0
[ 58.988714] Modules linked in:
[ 58.989207] CPU: 0 PID: 9 Comm: kworker/u4:0 Tainted: G W 6.3.0-rc1-fe15c26ee26e+ #1
[ 58.990565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 58.992351] Workqueue: events_unbound bpf_map_free_deferred
[ 58.993213] RIP: 0010:refcount_warn_saturate+0x15e/0x1c0
[ 58.994119] Code: 02 31 ff 89 de e8 52 ab 52 ff 84 db 0f 85 1f ff ff ff e8 e5 a9 52 ff 48 c7 c7 08 b7 be 83 c6 05 1f c2 43 02 01 e8 a2 54 36 ff <0f> 0b e9 00 9
[ 58.997238] RSP: 0018:ffffc90000053da0 EFLAGS: 00010286
[ 58.998124] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8111b6fb
[ 58.999358] RDX: 0000000000000000 RSI: ffff888007372340 RDI: 0000000000000002
[ 59.000558] RBP: ffffc90000053db0 R08: 0000000000000000 R09: 0000000000000001
[ 59.001737] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88800ec16200
[ 59.002928] R13: 0000000000000000 R14: ffff888015808000 R15: ffff88800ed6fc00
[ 59.004172] FS: 0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[ 59.005541] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 59.006521] CR2: 0000000020000000 CR3: 0000000003e3a006 CR4: 0000000000770ef0
[ 59.007770] PKRU: 55555554
[ 59.008260] Call Trace:
[ 59.008709] <TASK>
[ 59.009094] sock_map_free+0x36c/0x390
[ 59.009745] bpf_map_free_deferred+0x124/0x2d0
[ 59.010543] process_one_work+0x3b1/0x9e0
[ 59.011288] worker_thread+0x52/0x660
[ 59.011947] ? __pfx_worker_thread+0x10/0x10
[ 59.012694] kthread+0x161/0x1a0
[ 59.013270] ? __pfx_kthread+0x10/0x10
[ 59.013925] ret_from_fork+0x29/0x50
[ 59.014570] </TASK>
[ 59.014959] irq event stamp: 25629
[ 59.015589] hardirqs last enabled at (25639): [<ffffffff811eac21>] __up_console_sem+0x91/0xb0
[ 59.017047] hardirqs last disabled at (25646): [<ffffffff811eac06>] __up_console_sem+0x76/0xb0
[ 59.018450] softirqs last enabled at (25412): [<ffffffff83023a3c>] __do_softirq+0x31c/0x49c
[ 59.019851] softirqs last disabled at (25367): [<ffffffff8112b5a4>] irq_exit_rcu+0xc4/0x100
[ 59.021282] ---[ end trace 0000000000000000 ]---

Found this issue existed in v6.2 kernel also.
And bisected from v6.2 to v5.11 and found the bad commit:
"
0a182f8d6074
bpf. sockmap: fix race in sock_map_free()
"
Reverted the bad commit on top of v6.2, above issue was gone.


There was "use after free" in dmesg info.
And I found that GPIO kselftest triggered the similar Call Trace also.
"
cd linux/tools/testing/selftests

1. ./kselftest_install.sh


2. cd linux/tools/testing/selftests/kselftest_install/gpio

# ./gpio-mockup.sh
1. Module load tests
1.1. dynamic allocation of gpio
2. Module load error tests
2.1 gpio overflow
test failed: unexpected chip - gpiochip1
GPIO gpio-mockup test FAIL
"
Related dmesg:
"
[ 9729.144086] ------------[ cut here ]------------
[ 9729.149680] refcount_t: underflow; use-after-free.
[ 9729.155478] WARNING: CPU: 75 PID: 809778 at lib/refcount.c:28 refcount_warn_saturate+0xd3/0x120
[ 9729.165489] Modules linked in: gpio_mockup(-) xt_state rds quota_v2 quota_tree brd overlay ntfs btrfs blake2b_generic xor raid6_pq ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter xt_tcpudp ipt_REJECT nf_reject_ipv4 xt_conntrack nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc vsock_loopback vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb ip6_tables vsock ip_tables nft_compat x_tables ip_set nf_tables nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit x86_pkg_temp_thermal snd_hda_codec_realtek intel_powerclamp snd_hda_codec_generic ledtrig_audio coretemp snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_seq kmem ofpart snd_seq_device kvm_intel snd_pcm pmt_telemetry pmt_crashlog cmdlinepart snd_timer kvm ipmi_ssif spi_nor device_dax intel_sdsi pmt_class irqbypass mei_me mtd rapl snd i2c_i801 isst_if_mmio pcspkr dax_hmem isst_if_mbox_pci intel_cstate
[ 9729.165539] spi_intel_pci soundcore isst_if_common mei intel_vsec idxd i2c_ismt spi_intel i2c_smbus ipmi_si acpi_power_meter acpi_pad mac_hid sch_fq_codel crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd cryptd ast igc pinctrl_emmitsburg pinctrl_intel pwm_lpss [last unloaded: init_module(O)]
[ 9729.297472] CPU: 75 PID: 809778 Comm: modprobe Tainted: G S IO 6.2.0-lkml #18
[ 9729.306942] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.SYS.0090.D03.2210040200 10/04/2022
[ 9729.318926] RIP: 0010:refcount_warn_saturate+0xd3/0x120
[ 9729.325020] Code: 9f 00 0f 0b 5d c3 cc cc cc cc 80 3d c5 72 b7 01 00 0f 85 74 ff ff ff 48 c7 c7 10 b9 2b bc c6 05 b1 72 b7 01 01 e8 7d 39 9f 00 <0f> 0b 5d c3 cc cc cc cc 80 3d 9b 72 b7 01 00 0f 85 4c ff ff ff 48
[ 9729.346438] RSP: 0018:ff38111b0f3b3e48 EFLAGS: 00010286
[ 9729.352524] RAX: 0000000000000000 RBX: ff1e4ed2436ee100 RCX: 0000000000000001
[ 9729.360745] RDX: 0000000000000001 RSI: ffffffffbc1ea3d1 RDI: 00000000ffffffff
[ 9729.368960] RBP: ff38111b0f3b3e48 R08: 0000000000000000 R09: c0000000ff7fffff
[ 9729.377188] R10: 0000000000000001 R11: ff38111b0f3b3cb8 R12: ff1e4ed24b90fc00
[ 9729.385415] R13: ff1e4ed2436ee140 R14: 0000000000000000 R15: 0000000000000000
[ 9729.393639] FS: 00007f76415d5740(0000) GS:ff1e4ef0972c0000(0000) knlGS:0000000000000000
[ 9729.402941] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9729.409618] CR2: 00007f764154fd96 CR3: 000000108c20c006 CR4: 0000000000f71ee0
[ 9729.417860] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9729.426087] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 9729.434328] PKRU: 55555554
[ 9729.437633] Call Trace:
[ 9729.440650] <TASK>
[ 9729.443266] kobject_put+0x10f/0x1b0
[ 9729.447520] fwnode_remove_software_node+0x35/0x50
[ 9729.453158] gpio_mockup_unregister_pdevs+0x3a/0x4e [gpio_mockup]
[ 9729.460233] gpio_mockup_exit+0xd/0x3b3 [gpio_mockup]
[ 9729.466169] __x64_sys_delete_module+0x140/0x240
[ 9729.471619] do_syscall_64+0x3b/0x90
[ 9729.475909] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 9729.481830] RIP: 0033:0x7f7641954f7b
[ 9729.486100] Code: 73 01 c3 48 8b 0d 15 8f 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e5 8e 0c 00 f7 d8 64 89 01 48
[ 9729.507517] RSP: 002b:00007ffef02671e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 9729.516270] RAX: ffffffffffffffda RBX: 0000562e93269ee0 RCX: 00007f7641954f7b
[ 9729.524519] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000562e93269f48
[ 9729.532752] RBP: 0000562e93269f48 R08: 00007ffef0266191 R09: 0000000000000000
[ 9729.541004] R10: 00007f76419d3ae0 R11: 0000000000000206 R12: 0000000000000000
[ 9729.549266] R13: 0000000000000000 R14: 0000562e93269f48 R15: 0000000000000000
[ 9729.557518] </TASK>
[ 9729.560246] ---[ end trace 0000000000000000 ]---
"

I hope it's helpful.


---

If you don't need the following environment to reproduce the problem or if you
already have one, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
// start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
// You could change the bzImage_xxx as you want
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel vm.


Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl
make
make install
---

Thanks!
BR.