Re: [syzbot] [rdma?] WARNING in rxe_skb_tx_dtor

From: Zhu Yanjun
Date: Fri May 02 2025 - 05:54:38 EST


On 01.05.25 18:45, syzbot wrote:
Hello,

syzbot found the following issue on:

HEAD commit: 8bac8898fe39 Merge tag 'mmc-v6.15-rc1' of git://git.kernel..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16b6d774580000
kernel config: https://syzkaller.appspot.com/x/.config?x=a9a25b7a36123454
dashboard link: https://syzkaller.appspot.com/bug?extid=8425ccfb599521edb153
compiler: Debian clang version 20.1.2 (++20250402124445+58df0ef89dd6-1~exp1~20250402004600.97), Debian LLD 20.1.2

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-8bac8898.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/2a76d594c0f5/vmlinux-8bac8898.xz
kernel image: https://storage.googleapis.com/syzbot-assets/dae09c25780d/bzImage-8bac8898.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8425ccfb599521edb153@xxxxxxxxxxxxxxxxxxxxxxxxx

------------[ cut here ]------------
WARNING: CPU: 0 PID: 1046 at drivers/infiniband/sw/rxe/rxe_net.c:357 rxe_skb_tx_dtor+0x8b/0x2a0 drivers/infiniband/sw/rxe/rxe_net.c:357

This is a known problem. It seems to be related with the following commit.

commit 1a633bdc8fd9e9e4a9f9a668ae122edfc5aacc86
Author: Bob Pearson <rpearsonhpe@xxxxxxxxx>
Date: Fri Mar 29 09:55:15 2024 -0500

RDMA/rxe: Let destroy qp succeed with stuck packet

In some situations a sent packet may get queued in the NIC longer than
than timeout of a ULP. Currently if this happens the ULP may try to reset
the link by destroying the qp and setting up an alternate connection but
will fail because the rxe driver is waiting for the packet to finish
getting sent and be returned to the skb destructor function where the qp
reference holding things up will be dropped. This patch modifies the way
that the qp is passed to the destructor to pass the qp index and not a qp
pointer. Then the destructor will attempt to lookup the qp from its index
and if it fails exit early. This requires taking a reference on the struct
sock rather than the qp allowing the qp to be destroyed while the sk is
still around waiting for the packet to finish.

Link: https://lore.kernel.org/r/20240329145513.35381-15-rpearsonhpe@xxxxxxxxx
Signed-off-by: Bob Pearson <rpearsonhpe@xxxxxxxxx>
Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx>

Zhu Yanjun

Modules linked in:
CPU: 0 UID: 0 PID: 1046 Comm: kworker/u4:9 Not tainted 6.15.0-rc4-syzkaller-00040-g8bac8898fe39 #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Workqueue: rxe_wq do_work
RIP: 0010:rxe_skb_tx_dtor+0x8b/0x2a0 drivers/infiniband/sw/rxe/rxe_net.c:357
Code: 80 3c 20 00 74 08 4c 89 ff e8 41 ee 8c f9 4d 8b 37 44 89 f6 83 e6 01 31 ff e8 11 fe 2a f9 41 f6 c6 01 75 0e e8 26 f9 2a f9 90 <0f> 0b 90 e9 b4 01 00 00 4c 89 ff e8 35 c4 fa 01 48 89 c7 be 0e 00
RSP: 0018:ffffc90000007a08 EFLAGS: 00010246
RAX: ffffffff8894c5aa RBX: ffff88803ec8d280 RCX: ffff888035088000
RDX: 0000000000000100 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: ffffffff886e3f04 R12: dffffc0000000000
R13: 1ffff11007d91a5b R14: 0000000000025820 R15: ffff888034060000
FS: 0000000000000000(0000) GS:ffff88808d6cc000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7c6d874fc8 CR3: 00000000428c8000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
skb_release_head_state+0xfe/0x250 net/core/skbuff.c:1149
napi_consume_skb+0xd2/0x1e0 net/core/skbuff.c:-1
e1000_unmap_and_free_tx_resource drivers/net/ethernet/intel/e1000/e1000_main.c:1972 [inline]
e1000_clean_tx_irq drivers/net/ethernet/intel/e1000/e1000_main.c:3864 [inline]
e1000_clean+0x49d/0x2b00 drivers/net/ethernet/intel/e1000/e1000_main.c:3805
__napi_poll+0xc4/0x480 net/core/dev.c:7324
napi_poll net/core/dev.c:7388 [inline]
net_rx_action+0x6ea/0xdf0 net/core/dev.c:7510
handle_softirqs+0x283/0x870 kernel/softirq.c:579
do_softirq+0xec/0x180 kernel/softirq.c:480
</IRQ>
<TASK>
__local_bh_enable_ip+0x17d/0x1c0 kernel/softirq.c:407
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:910 [inline]
__dev_queue_xmit+0x1cd7/0x3a70 net/core/dev.c:4656
neigh_output include/net/neighbour.h:539 [inline]
ip6_finish_output2+0x11fb/0x16a0 net/ipv6/ip6_output.c:141
__ip6_finish_output net/ipv6/ip6_output.c:-1 [inline]
ip6_finish_output+0x234/0x7d0 net/ipv6/ip6_output.c:226
rxe_send drivers/infiniband/sw/rxe/rxe_net.c:391 [inline]
rxe_xmit_packet+0x79e/0xa30 drivers/infiniband/sw/rxe/rxe_net.c:450
rxe_requester+0x1fea/0x3d20 drivers/infiniband/sw/rxe/rxe_req.c:805
rxe_sender+0x16/0x50 drivers/infiniband/sw/rxe/rxe_req.c:839
do_task+0x1ad/0x6b0 drivers/infiniband/sw/rxe/rxe_task.c:127
process_one_work kernel/workqueue.c:3238 [inline]
process_scheduled_works+0xadb/0x17a0 kernel/workqueue.c:3319
worker_thread+0x8a0/0xda0 kernel/workqueue.c:3400
kthread+0x70e/0x8a0 kernel/kthread.c:464
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:153
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@xxxxxxxxxxxxxxxx.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup