Re: [Syzkaller & bisect] There is WARNING: refcount bug in sock_map_free in v6.3-rc1

From: Pengfei Xu
Date: Tue Apr 04 2023 - 05:31:12 EST


++ GPIO and kself-test mailing list.

Hi kernel experts,

It's a soft remind.

My colleague Lai Yi found that similar "refcount_t: underflow; use-after-free"
issue still existed in v6.3-rc5 kernel on x86 platforms.

We could reproduce issue from kself-test: gpio-mockup.sh easily:
kernel/tools/testing/selftests/gpio/gpio-mockup.sh:

"
[ 5781.338917] -----------[ cut here ]-----------
[ 5781.344192] refcount_t: underflow; use-after-free.
[ 5781.349666] WARNING: CPU: 250 PID: 82496 at lib/refcount.c:25 refcount_warn_saturate+0xbe/0x110
[ 5781.359550] Modules linked in: gpio_mockup isst_if_mmio isst_if_mbox_pci intel_th_sth stm_core intel_th_pti intel_th_pci intel_th_gth pmt_telemetry pmt_class intel_vsec intel_rapl_msr intel_rapl_common nfsv3 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace bridge stp llc sunrpc intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt ofpart kvm_intel intel_pmc_bxt iTCO_vendor_support spi_nor mtd intel_sdsi kvm spdm irqbypass dax_hmem joydev asn1_encoder snd_pcm mei_me i2c_i801 spi_intel_pci isst_if_common idxd snd_timer intel_th i2c_smbus spi_intel mei i2c_ismt ipmi_ssif cxl_acpi ipmi_si cxl_core acpi_power_meter crc32c_intel i40e igb dca igc pinctrl_emmitsburg pinctrl_intel pwm_lpss fuse [last unloaded: isst_if_mmio]
[ 5781.438080] CPU: 250 PID: 82496 Comm: modprobe Not tainted 6.3.0-rc5 #1
[ 5781.449711] Hardware name: Intel Corporation, BIOS IFWI 03/12/2023
[ 5781.461615] RIP: 0010:refcount_warn_saturate+0xbe/0x110
[ 5781.467585] Code: 01 01 e8 75 56 8e ff 0f 0b c3 cc cc cc cc 80 3d 4c 67 ac 01 00 75 85 48 c7 c7 b0 31 cd a9 c6 05 3c 67 ac 01 01 e8 52 56 8e ff <0f> 0b c3 cc cc cc cc 80 3d 27 67 ac 01 00 0f 85 5e ff ff ff 48 c7
[ 5781.488761] RSP: 0018:ff45a7f44d39feb0 EFLAGS: 00010286
[ 5781.494745] RAX: 0000000000000000 RBX: ffffffffc0b36540 RCX: 0000000000000000
[ 5781.502871] RDX: 0000000000000002 RSI: ffffffffa9c065c8 RDI: 00000000ffffffff
[ 5781.510984] RBP: ff31c1afa78cb800 R08: 0000000000000001 R09: 0000000000000003
[ 5781.519100] R10: ff31c1b6fc000000 R11: ff31c1b6fc000000 R12: ff31c1afa78c4f40
[ 5781.527215] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 5781.535337] FS: 00007f9bc705a740(0000) GS:ff31c1b700280000(0000) knlGS:0000000000000000
[ 5781.544529] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5781.551063] CR2: 00007f9bc5e50dc0 CR3: 000000093b36c003 CR4: 0000000000f71ee0
[ 5781.559180] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5781.567307] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 5781.575413] PKRU: 55555554
[ 5781.578551] Call Trace:
[ 5781.581394] <TASK>
[ 5781.583868] gpio_mockup_exit+0x33/0x420 [gpio_mockup]
[ 5781.589756] __do_sys_delete_module.constprop.0+0x180/0x270
[ 5781.596112] ? syscall_trace_enter.constprop.0+0x17f/0x1b0
[ 5781.602354] do_syscall_64+0x43/0x90
[ 5781.606488] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 5781.612274] RIP: 0033:0x7f9bc5f2ff7b
[ 5781.616376] Code: 73 01 c3 48 8b 0d 15 8f 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e5 8e 0c 00 f7 d8 64 89 01 48
[ 5781.637543] RSP: 002b:00007ffcac9c5e48 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 5781.646146] RAX: ffffffffffffffda RBX: 0000557113827e40 RCX: 00007f9bc5f2ff7b
[ 5781.654267] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000557113827ea8
[ 5781.662382] RBP: 0000557113827ea8 R08: 00007ffcac9c4df1 R09: 0000000000000000
[ 5781.670491] R10: 00007f9bc5faeae0 R11: 0000000000000206 R12: 0000000000000000
[ 5781.678601] R13: 0000000000000000 R14: 0000557113827ea8 R15: 0000000000000000
[ 5781.686770] </TASK>
[ 5781.689343] --[ end trace 0000000000000000 ]--
"

And we could reproduce reproduce this issue as follow way also in vm:
https://github.com/xupengfe/syzkaller_logs/blob/main/230306_123510_sock_map_free/repro.c
I newly updated the syzkaller analysis repro.status repro.prog info in all
detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230306_123510_sock_map_free

Bisected and suspected bad commit is:
"
0a182f8d6074
bpf. sockmap: fix race in sock_map_free()
"

I hope the info is helpful.

Thanks!
BR.
-Pengfei

On 2023-03-07 at 23:18:23 +0800, Pengfei Xu wrote:
> Sorry to update the missed reproduced code from syzkaller:
> https://github.com/xupengfe/syzkaller_logs/blob/main/230306_123510_sock_map_free/repro.c
>
> Thanks!
> BR.
>
> On 2023-03-07 at 22:36:55 +0800, Pengfei Xu wrote:
> > Hi Eric Dumazet,
> >
> > Greeting!
> >
> > Platform: x86 platform
> > All detailed log link: https://github.com/xupengfe/syzkaller_logs/tree/main/230306_123510_sock_map_free
> > v6.3-rc1 problem dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230306_123510_sock_map_free/v6.3-rc1_problem_dmesg.log
> > Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230306_123510_sock_map_free/bisect_info.log
> > Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230306_123510_sock_map_free/bisect_info.log
> >
> > There is WARNING: refcount bug in sock_map_free in v6.3-rc1 kernel.
> >
> >
> > [ 58.955232] ------------[ cut here ]------------
> > [ 58.955976] refcount_t: addition on 0; use-after-free.
> > [ 58.956752] WARNING: CPU: 0 PID: 9 at lib/refcount.c:25 refcount_warn_saturate+0xe6/0x1c0
> > [ 58.957846] Modules linked in:
> > [ 58.958278] CPU: 0 PID: 9 Comm: kworker/u4:0 Not tainted 6.3.0-rc1-fe15c26ee26e+ #1
> > [ 58.959340] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> > [ 58.960854] Workqueue: events_unbound bpf_map_free_deferred
> > [ 58.961630] RIP: 0010:refcount_warn_saturate+0xe6/0x1c0
> > [ 58.962344] Code: 1d b8 c2 43 02 31 ff 89 de e8 c6 ab 52 ff 84 db 75 97 e8 5d aa 52 ff 48 c7 c7 d8 b6 be 83 c6 05 98 c2 43 02 01 e8 1a 55 36 ff <0f> 0b e9 78 9
> > [ 58.964836] RSP: 0018:ffffc90000053da0 EFLAGS: 00010286
> > [ 58.965582] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8111b6fb
> > [ 58.966570] RDX: 0000000000000000 RSI: ffff888007372340 RDI: 0000000000000002
> > [ 58.967569] RBP: ffffc90000053db0 R08: 0000000000000000 R09: 0000000000000001
> > [ 58.968547] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88800ec16200
> > [ 58.969532] R13: 0000000000000000 R14: ffff888015808000 R15: ffff88800ed6fc00
> > [ 58.970500] FS: 0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
> > [ 58.971647] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 58.972447] CR2: 0000000020000000 CR3: 0000000003e3a006 CR4: 0000000000770ef0
> > [ 58.973461] PKRU: 55555554
> > [ 58.973851] Call Trace:
> > [ 58.974206] <TASK>
> > [ 58.974520] sock_map_free+0x387/0x390
> > [ 58.975077] bpf_map_free_deferred+0x124/0x2d0
> > [ 58.975753] process_one_work+0x3b1/0x9e0
> > [ 58.976350] worker_thread+0x52/0x660
> > [ 58.976895] ? __pfx_worker_thread+0x10/0x10
> > [ 58.977514] kthread+0x161/0x1a0
> > [ 58.977994] ? __pfx_kthread+0x10/0x10
> > [ 58.978529] ret_from_fork+0x29/0x50
> > [ 58.979073] </TASK>
> > [ 58.979448] irq event stamp: 25237
> > [ 58.979940] hardirqs last enabled at (25245): [<ffffffff811eac21>] __up_console_sem+0x91/0xb0
> > [ 58.981151] hardirqs last disabled at (25252): [<ffffffff811eac06>] __up_console_sem+0x76/0xb0
> > [ 58.982466] softirqs last enabled at (25234): [<ffffffff83023a3c>] __do_softirq+0x31c/0x49c
> > [ 58.983769] softirqs last disabled at (25229): [<ffffffff8112b5a4>] irq_exit_rcu+0xc4/0x100
> > [ 58.985055] ---[ end trace 0000000000000000 ]---
> > [ 58.985813] ------------[ cut here ]------------
> > [ 58.986530] refcount_t: underflow; use-after-free.
> > [ 58.987460] WARNING: CPU: 0 PID: 9 at lib/refcount.c:28 refcount_warn_saturate+0x15e/0x1c0
> > [ 58.988714] Modules linked in:
> > [ 58.989207] CPU: 0 PID: 9 Comm: kworker/u4:0 Tainted: G W 6.3.0-rc1-fe15c26ee26e+ #1
> > [ 58.990565] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> > [ 58.992351] Workqueue: events_unbound bpf_map_free_deferred
> > [ 58.993213] RIP: 0010:refcount_warn_saturate+0x15e/0x1c0
> > [ 58.994119] Code: 02 31 ff 89 de e8 52 ab 52 ff 84 db 0f 85 1f ff ff ff e8 e5 a9 52 ff 48 c7 c7 08 b7 be 83 c6 05 1f c2 43 02 01 e8 a2 54 36 ff <0f> 0b e9 00 9
> > [ 58.997238] RSP: 0018:ffffc90000053da0 EFLAGS: 00010286
> > [ 58.998124] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8111b6fb
> > [ 58.999358] RDX: 0000000000000000 RSI: ffff888007372340 RDI: 0000000000000002
> > [ 59.000558] RBP: ffffc90000053db0 R08: 0000000000000000 R09: 0000000000000001
> > [ 59.001737] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88800ec16200
> > [ 59.002928] R13: 0000000000000000 R14: ffff888015808000 R15: ffff88800ed6fc00
> > [ 59.004172] FS: 0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
> > [ 59.005541] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 59.006521] CR2: 0000000020000000 CR3: 0000000003e3a006 CR4: 0000000000770ef0
> > [ 59.007770] PKRU: 55555554
> > [ 59.008260] Call Trace:
> > [ 59.008709] <TASK>
> > [ 59.009094] sock_map_free+0x36c/0x390
> > [ 59.009745] bpf_map_free_deferred+0x124/0x2d0
> > [ 59.010543] process_one_work+0x3b1/0x9e0
> > [ 59.011288] worker_thread+0x52/0x660
> > [ 59.011947] ? __pfx_worker_thread+0x10/0x10
> > [ 59.012694] kthread+0x161/0x1a0
> > [ 59.013270] ? __pfx_kthread+0x10/0x10
> > [ 59.013925] ret_from_fork+0x29/0x50
> > [ 59.014570] </TASK>
> > [ 59.014959] irq event stamp: 25629
> > [ 59.015589] hardirqs last enabled at (25639): [<ffffffff811eac21>] __up_console_sem+0x91/0xb0
> > [ 59.017047] hardirqs last disabled at (25646): [<ffffffff811eac06>] __up_console_sem+0x76/0xb0
> > [ 59.018450] softirqs last enabled at (25412): [<ffffffff83023a3c>] __do_softirq+0x31c/0x49c
> > [ 59.019851] softirqs last disabled at (25367): [<ffffffff8112b5a4>] irq_exit_rcu+0xc4/0x100
> > [ 59.021282] ---[ end trace 0000000000000000 ]---
> >
> > Found this issue existed in v6.2 kernel also.
> > And bisected from v6.2 to v5.11 and found the bad commit:
> > "
> > 0a182f8d6074
> > bpf. sockmap: fix race in sock_map_free()
> > "
> > Reverted the bad commit on top of v6.2, above issue was gone.
> >
> >
> > There was "use after free" in dmesg info.
> > And I found that GPIO kselftest triggered the similar Call Trace also.
> > "
> > cd linux/tools/testing/selftests
> >
> > 1. ./kselftest_install.sh
> >
> >
> > 2. cd linux/tools/testing/selftests/kselftest_install/gpio
> >
> > # ./gpio-mockup.sh
> > 1. Module load tests
> > 1.1. dynamic allocation of gpio
> > 2. Module load error tests
> > 2.1 gpio overflow
> > test failed: unexpected chip - gpiochip1
> > GPIO gpio-mockup test FAIL
> > "
> > Related dmesg:
> > "
> > [ 9729.144086] ------------[ cut here ]------------
> > [ 9729.149680] refcount_t: underflow; use-after-free.
> > [ 9729.155478] WARNING: CPU: 75 PID: 809778 at lib/refcount.c:28 refcount_warn_saturate+0xd3/0x120
> > [ 9729.165489] Modules linked in: gpio_mockup(-) xt_state rds quota_v2 quota_tree brd overlay ntfs btrfs blake2b_generic xor raid6_pq ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter xt_tcpudp ipt_REJECT nf_reject_ipv4 xt_conntrack nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc vsock_loopback vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb ip6_tables vsock ip_tables nft_compat x_tables ip_set nf_tables nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit x86_pkg_temp_thermal snd_hda_codec_realtek intel_powerclamp snd_hda_codec_generic ledtrig_audio coretemp snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_seq kmem ofpart snd_seq_device kvm_intel snd_pcm pmt_telemetry pmt_crashlog cmdlinepart snd_timer kvm ipmi_ssif spi_nor device_dax intel_sdsi pmt_class irqbypass mei_me mtd rapl snd i2c_i801 isst_if_mmio pcspkr dax_hmem isst_if_mbox_pci intel_cstate
> > [ 9729.165539] spi_intel_pci soundcore isst_if_common mei intel_vsec idxd i2c_ismt spi_intel i2c_smbus ipmi_si acpi_power_meter acpi_pad mac_hid sch_fq_codel crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd cryptd ast igc pinctrl_emmitsburg pinctrl_intel pwm_lpss [last unloaded: init_module(O)]
> > [ 9729.297472] CPU: 75 PID: 809778 Comm: modprobe Tainted: G S IO 6.2.0-lkml #18
> > [ 9729.306942] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.SYS.0090.D03.2210040200 10/04/2022
> > [ 9729.318926] RIP: 0010:refcount_warn_saturate+0xd3/0x120
> > [ 9729.325020] Code: 9f 00 0f 0b 5d c3 cc cc cc cc 80 3d c5 72 b7 01 00 0f 85 74 ff ff ff 48 c7 c7 10 b9 2b bc c6 05 b1 72 b7 01 01 e8 7d 39 9f 00 <0f> 0b 5d c3 cc cc cc cc 80 3d 9b 72 b7 01 00 0f 85 4c ff ff ff 48
> > [ 9729.346438] RSP: 0018:ff38111b0f3b3e48 EFLAGS: 00010286
> > [ 9729.352524] RAX: 0000000000000000 RBX: ff1e4ed2436ee100 RCX: 0000000000000001
> > [ 9729.360745] RDX: 0000000000000001 RSI: ffffffffbc1ea3d1 RDI: 00000000ffffffff
> > [ 9729.368960] RBP: ff38111b0f3b3e48 R08: 0000000000000000 R09: c0000000ff7fffff
> > [ 9729.377188] R10: 0000000000000001 R11: ff38111b0f3b3cb8 R12: ff1e4ed24b90fc00
> > [ 9729.385415] R13: ff1e4ed2436ee140 R14: 0000000000000000 R15: 0000000000000000
> > [ 9729.393639] FS: 00007f76415d5740(0000) GS:ff1e4ef0972c0000(0000) knlGS:0000000000000000
> > [ 9729.402941] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 9729.409618] CR2: 00007f764154fd96 CR3: 000000108c20c006 CR4: 0000000000f71ee0
> > [ 9729.417860] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 9729.426087] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> > [ 9729.434328] PKRU: 55555554
> > [ 9729.437633] Call Trace:
> > [ 9729.440650] <TASK>
> > [ 9729.443266] kobject_put+0x10f/0x1b0
> > [ 9729.447520] fwnode_remove_software_node+0x35/0x50
> > [ 9729.453158] gpio_mockup_unregister_pdevs+0x3a/0x4e [gpio_mockup]
> > [ 9729.460233] gpio_mockup_exit+0xd/0x3b3 [gpio_mockup]
> > [ 9729.466169] __x64_sys_delete_module+0x140/0x240
> > [ 9729.471619] do_syscall_64+0x3b/0x90
> > [ 9729.475909] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> > [ 9729.481830] RIP: 0033:0x7f7641954f7b
> > [ 9729.486100] Code: 73 01 c3 48 8b 0d 15 8f 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e5 8e 0c 00 f7 d8 64 89 01 48
> > [ 9729.507517] RSP: 002b:00007ffef02671e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> > [ 9729.516270] RAX: ffffffffffffffda RBX: 0000562e93269ee0 RCX: 00007f7641954f7b
> > [ 9729.524519] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000562e93269f48
> > [ 9729.532752] RBP: 0000562e93269f48 R08: 00007ffef0266191 R09: 0000000000000000
> > [ 9729.541004] R10: 00007f76419d3ae0 R11: 0000000000000206 R12: 0000000000000000
> > [ 9729.549266] R13: 0000000000000000 R14: 0000562e93269f48 R15: 0000000000000000
> > [ 9729.557518] </TASK>
> > [ 9729.560246] ---[ end trace 0000000000000000 ]---
> > "
> >
> > I hope it's helpful.
> >
> >
> > ---
> >
> > If you don't need the following environment to reproduce the problem or if you
> > already have one, please ignore the following information.
> >
> > How to reproduce:
> > git clone https://gitlab.com/xupengfe/repro_vm_env.git
> > cd repro_vm_env
> > tar -xvf repro_vm_env.tar.gz
> > cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
> > // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
> > // You could change the bzImage_xxx as you want
> > You could use below command to log in, there is no password for root.
> > ssh -p 10023 root@localhost
> >
> > After login vm(virtual machine) successfully, you could transfer reproduced
> > binary to the vm by below way, and reproduce the problem in vm:
> > gcc -pthread -o repro repro.c
> > scp -P 10023 repro root@localhost:/root/
> >
> > Get the bzImage for target kernel:
> > Please use target kconfig and copy it to kernel_src/.config
> > make olddefconfig
> > make -jx bzImage //x should equal or less than cpu num your pc has
> >
> > Fill the bzImage file into above start3.sh to load the target kernel vm.
> >
> >
> > Tips:
> > If you already have qemu-system-x86_64, please ignore below info.
> > If you want to install qemu v7.1.0 version:
> > git clone https://github.com/qemu/qemu.git
> > cd qemu
> > git checkout -f v7.1.0
> > mkdir build
> > cd build
> > yum install -y ninja-build.x86_64
> > ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl
> > make
> > make install
> > ---
> >
> > Thanks!
> > BR.