Re: [PATCH net 0/2] Fix NPE discovered by running bpf kselftest
From: Levi Zim
Date: Wed Dec 04 2024 - 01:49:35 EST
On 2024-12-04 09:01, Cong Wang wrote:
On Sun, Dec 01, 2024 at 09:42:08AM +0800, Levi Zim wrote:
On 2024-11-30 21:38, Levi Zim via B4 Relay wrote:
I found that bpf kselftest sockhash::test_txmsg_cork_hangs in
test_sockmap.c triggers a kernel NULL pointer dereference:
Interesting, I also ran this test recently and I didn't see such a
crash.
I am also curious about why other people or the CI didn't hit such crash.
I just did a search and find only one mention of this bug:
https://lore.kernel.org/bpf/20241020110345.1468595-1-zijianzhang@xxxxxxxxxxxxx/
Personally when trying to run test_sockmap on Arch Linux 6.12.1 kernel,
I get rcu stall instead of this NPE:
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: Tasks blocked on level-0 rcu_node (CPUs 0-11): P3378
rcu: (detected by 0, t=18002 jiffies, g=9525, q=28619 ncpus=12)
task:test_sockmap state:R running task stack:0 pid:3378
tgid:3378 ppid:1168 flags:0x00004006
Call Trace:
<TASK>
? __schedule+0x3b8/0x12b0
? get_page_from_freelist+0x366/0x1730
? sysvec_apic_timer_interrupt+0xe/0x90
? asm_sysvec_apic_timer_interrupt+0x1a/0x20
? bpf_msg_pop_data+0x41e/0x690
? mem_cgroup_charge_skmem+0x40/0x60
? bpf_prog_1fca1a523ce93f38_bpf_prog4+0x23d/0x248
? sk_psock_msg_verdict+0x99/0x1e0
? tcp_bpf_sendmsg+0x42d/0x9f0
? sock_sendmsg+0x109/0x130
? splice_to_socket+0x359/0x4f0
? shmem_file_splice_read+0x2cd/0x300
? direct_splice_actor+0x51/0x130
? splice_direct_to_actor+0xf0/0x260
? __pfx_direct_splice_actor+0x10/0x10
? do_splice_direct+0x77/0xc0
? __pfx_direct_file_splice_eof+0x10/0x10
? do_sendfile+0x382/0x440
? __x64_sys_sendfile64+0xb3/0xd0
? do_syscall_64+0x82/0x190
? find_next_iomem_res+0xbe/0x130
? __pfx_pagerange_is_ram_callback+0x10/0x10
? walk_system_ram_range+0xa6/0x100
? __pte_offset_map+0x1b/0x180
? __pte_offset_map_lock+0x9e/0x130
? set_ptes.isra.0+0x41/0x90
? insert_pfn+0xba/0x210
? vmf_insert_pfn_prot+0x85/0xd0
? __do_fault+0x30/0x170
? do_fault+0x303/0x4c0
? __handle_mm_fault+0x7c2/0xfa0
? shmem_file_write_iter+0x5b/0x90
? __count_memcg_events+0x53/0xf0
? count_memcg_events.constprop.0+0x1a/0x30
? handle_mm_fault+0x1bb/0x2c0
? do_user_addr_fault+0x17f/0x620
? clear_bhb_loop+0x25/0x80
? clear_bhb_loop+0x25/0x80
? clear_bhb_loop+0x25/0x80
? entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>
BUG: kernel NULL pointer dereference, address: 0000000000000008
? __die_body+0x6e/0xb0
? __die+0x8b/0xa0
? page_fault_oops+0x358/0x3c0
? local_clock+0x19/0x30
? lock_release+0x11b/0x440
? kernelmode_fixup_or_oops+0x54/0x60
? __bad_area_nosemaphore+0x4f/0x210
? mmap_read_unlock+0x13/0x30
? bad_area_nosemaphore+0x16/0x20
? do_user_addr_fault+0x6fd/0x740
? prb_read_valid+0x1d/0x30
? exc_page_fault+0x55/0xd0
? asm_exc_page_fault+0x2b/0x30
? splice_to_socket+0x52e/0x630
? shmem_file_splice_read+0x2b1/0x310
direct_splice_actor+0x47/0x70
splice_direct_to_actor+0x133/0x300
? do_splice_direct+0x90/0x90
do_splice_direct+0x64/0x90
? __ia32_sys_tee+0x30/0x30
do_sendfile+0x214/0x300
__se_sys_sendfile64+0x8e/0xb0
__x64_sys_sendfile64+0x25/0x30
x64_sys_call+0xb82/0x2840
do_syscall_64+0x75/0x110
entry_SYSCALL_64_after_hwframe+0x4b/0x53
This is caused by tcp_bpf_sendmsg() returning a larger value(12289) than
size(8192), which causes the while loop in splice_to_socket() to release
an uninitialized pipe buf.
The underlying cause is that this code assumes sk_msg_memcopy_from_iter()
will copy all bytes upon success but it actually might only copy part of
it.
I am not sure what Fixes tag I should put. Git blame leads me to a refactor
commit
and I am not familiar with this part of code base. Any suggestions?
I think it is the following commit which introduced memcopy_from_iter()
(which was renamed to sk_msg_memcopy_from_iter() later):
commit 4f738adba30a7cfc006f605707e7aee847ffefa0
Author: John Fastabend <john.fastabend@xxxxxxxxx>
Date: Sun Mar 18 12:57:10 2018 -0700
bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
Please double check.
Thanks.
Thanks for your help. I will double check it.