Re: [syzbot] [rdma?] WARNING in gid_table_release_one (3)

From: Yanjun.Zhu

Date: Wed Nov 05 2025 - 15:10:48 EST



On 11/5/25 10:50 AM, Leon Romanovsky wrote:

On Wed, Nov 5, 2025, at 19:14, Jason Gunthorpe wrote:
On Wed, Nov 05, 2025 at 09:06:04AM -0800, syzbot wrote:
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING in gid_table_release_one

------------[ cut here ]------------
GID entry ref leak for dev syz1 index 2 ref=363, state: 3
WARNING: CPU: 1 PID: 50 at drivers/infiniband/core/cache.c:827 release_gid_table drivers/infiniband/core/cache.c:824 [inline]
WARNING: CPU: 1 PID: 50 at drivers/infiniband/core/cache.c:827 gid_table_release_one+0x5ae/0x6c0 drivers/infiniband/core/cache.c:904
Modules linked in:
CPU: 1 UID: 0 PID: 50 Comm: kworker/u8:3 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025
Workqueue: ib-unreg-wq ib_unregister_work
RIP: 0010:release_gid_table drivers/infiniband/core/cache.c:824 [inline]
RIP: 0010:gid_table_release_one+0x5ae/0x6c0 drivers/infiniband/core/cache.c:904
Code: e8 03 0f b6 04 28 84 c0 0f 85 cc 00 00 00 44 8b 03 48 c7 c7 60 7c 2b 8c 48 8b 74 24 28 44 89 fa 8b 4c 24 50 e8 73 e7 35 f9 90 <0f> 0b 90 90 44 8b 74 24 04 4c 8b 7c 24 20 4c 8b 64 24 48 e9 15 fe
RSP: 0018:ffffc90000bb78f8 EFLAGS: 00010246
RAX: 124fa0acf3bf2700 RBX: ffff8880268c1990 RCX: ffff888020289e40
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000002
RBP: dffffc0000000000 R08: 0000000000000003 R09: 0000000000000004
R10: dffffc0000000000 R11: fffffbfff1b7a678 R12: ffff88802ed4e2d8
R13: 00000000000001a8 R14: ffff88806a158010 R15: 0000000000000002
FS: 0000000000000000(0000) GS:ffff88812646a000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555712ce808 CR3: 000000006b6c8000 CR4: 00000000003526f0
Call Trace:
<TASK>
ib_device_release+0xd2/0x1c0 drivers/infiniband/core/device.c:509
device_release+0x9c/0x1c0 drivers/base/core.c:-1
kobject_cleanup lib/kobject.c:689 [inline]
kobject_release lib/kobject.c:720 [inline]
kref_put include/linux/kref.h:65 [inline]
kobject_put+0x22b/0x480 lib/kobject.c:737
process_one_work kernel/workqueue.c:3263 [inline]
process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3346
worker_thread+0x8a0/0xda0 kernel/workqueue.c:3427
kthread+0x711/0x8a0 kernel/kthread.c:463
ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>


Tested on:

commit: ad2cc78b RDMA/core: Fix WARNING in gid_table_release_one
git tree: https://github.com/zhuyj/linux.git v6.17_fix_gid_table_release_one
console output: https://syzkaller.appspot.com/x/log.txt?x=11dfa17c580000
kernel config: https://syzkaller.appspot.com/x/.config?x=2c614fa9e6f5bdc1
dashboard link: https://syzkaller.appspot.com/bug?extid=b0da83a6c0e2e2bddbd4
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
I think this disproves the theory that the the gid is sitting in a
work queue waiting to be cleaned up..
Yes, this is makes more sense to me when multiple ib_wq flush.
So we still need to find out what is holding on to the reference...

It’s still unclear what is holding the reference. From my tests, if we wait here for a short time, all the references are eventually released. It’s quite strange.

Yanjun.Zhu


Jason