Re: RCU stalls and GPFs in ceph/netfs

From: Max Kellermann
Date: Mon Jul 29 2024 - 06:17:34 EST


On Mon, Jul 29, 2024 at 11:18 AM Max Kellermann
<max.kellermann@xxxxxxxxx> wrote:
> I posted two candidate patches which both fix this bug;
>
> Minimal fix: https://lore.kernel.org/lkml/20240729090639.852732-1-max.kellermann@xxxxxxxxx/
> Fix which removes a bunch of obsolete code:
> https://lore.kernel.org/lkml/20240729091532.855688-1-max.kellermann@xxxxxxxxx/

These patches do fix the RCU stall bug (and should be merged), but
after running one cluster with my patch for a while, I found more Ceph
crashes:

------------[ cut here ]------------
WARNING: CPU: 3 PID: 1925 at fs/ceph/caps.c:3386
ceph_put_wrbuffer_cap_refs+0x1bb/0x1f0
Modules linked in:
CPU: 3 PID: 1925 Comm: kworker/3:2 Not tainted 6.10.2-cm4all1-vm+ #168
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Workqueue: ceph-cap ceph_cap_reclaim_work
RIP: 0010:ceph_put_wrbuffer_cap_refs+0x1bb/0x1f0
Code: 30 45 89 f5 bd 01 00 00 00 41 83 c6 01 31 d2 e9 fa fe ff ff 45
8d 6e ff 31 ed 31 d2 48 83 bb 18 04 00 00 00 0f 84 e4 fe ff ff <0f> 0b
e9 dd fe ff ff 45 8d 6e ff bd 01 00 00 00 ba 01 00 00 00 48
RSP: 0018:ffffb9a7406cba78 EFLAGS: 00010282
RAX: ffff9a2d42b7eb20 RBX: ffff9a2d42b7e688 RCX: ffffdbdadc446d80
RDX: 0000000000000000 RSI: ffff9a2d42b7eb18 RDI: ffff9a2d42b7e940
RBP: 0000000000000000 R08: ffffffffffffffc0 R09: ffff9a2d4254fc40
R10: 0000000000000020 R11: fefefefefefefeff R12: ffff9a2d42b7e940
R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff9a384eec0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055b248c82657 CR3: 000000010d31c002 CR4: 00000000001706b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? __warn+0x7c/0x110
? ceph_put_wrbuffer_cap_refs+0x1bb/0x1f0
? report_bug+0x14c/0x170
? handle_bug+0x3c/0x70
? exc_invalid_op+0x13/0x60
? asm_exc_invalid_op+0x16/0x20
? ceph_put_wrbuffer_cap_refs+0x1bb/0x1f0
? ceph_put_wrbuffer_cap_refs+0x27/0x1f0
ceph_invalidate_folio+0x9a/0xc0
truncate_cleanup_folio+0x52/0x90
truncate_inode_pages_range+0xfe/0x400
ceph_evict_inode+0x40/0x200
evict+0xc5/0x170
__dentry_kill+0x6e/0x160
dput+0xcb/0x180
__dentry_leases_walk+0x28d/0x430
ceph_trim_dentries+0xac/0x100
ceph_cap_reclaim_work+0x15/0x50
process_one_work+0x138/0x2e0
worker_thread+0x2b9/0x3d0
? __pfx_worker_thread+0x10/0x10
kthread+0xba/0xe0
? __pfx_kthread+0x10/0x10
ret_from_fork+0x30/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
---[ end trace 0000000000000000 ]---
BUG: kernel NULL pointer dereference, address: 0000000000000356
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: Oops: 0002 [#1] SMP PTI
CPU: 3 PID: 1925 Comm: kworker/3:2 Tainted: G W
6.10.2-cm4all1-vm+ #168
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Workqueue: ceph-cap ceph_cap_reclaim_work
RIP: 0010:ceph_put_snap_context+0xf/0x30
Code: 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 0f 1f 44 00 00 48 85 ff 74 12 b8 ff ff ff ff <f0> 0f
c1 07 83 f8 01 74 09 85 c0 7e 0a c3 cc cc cc cc e9 3a 62 70
RSP: 0018:ffffb9a7406cbaa8 EFLAGS: 00010206
RAX: 00000000ffffffff RBX: ffffdbdadc4465c0 RCX: ffffdbdadc446d80
RDX: 0000000000000000 RSI: ffff9a2d42b7eb18 RDI: 0000000000000356
RBP: 0000000000001000 R08: ffffffffffffffc0 R09: ffff9a2d4254fc40
R10: 0000000000000020 R11: fefefefefefefeff R12: 0000000000000356
R13: ffff9a2d42b7e688 R14: ffffffffffffffff R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff9a384eec0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000356 CR3: 000000010d31c002 CR4: 00000000001706b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? __die+0x1f/0x60
? page_fault_oops+0x158/0x450
? search_extable+0x22/0x30
? ceph_put_snap_context+0xf/0x30
? search_module_extables+0xe/0x40
? exc_page_fault+0x62/0x120
? asm_exc_page_fault+0x22/0x30
? ceph_put_snap_context+0xf/0x30
ceph_invalidate_folio+0xa2/0xc0
truncate_cleanup_folio+0x52/0x90
truncate_inode_pages_range+0xfe/0x400
ceph_evict_inode+0x40/0x200
evict+0xc5/0x170
__dentry_kill+0x6e/0x160
dput+0xcb/0x180
__dentry_leases_walk+0x28d/0x430
ceph_trim_dentries+0xac/0x100
ceph_cap_reclaim_work+0x15/0x50
process_one_work+0x138/0x2e0
worker_thread+0x2b9/0x3d0
? __pfx_worker_thread+0x10/0x10
kthread+0xba/0xe0
? __pfx_kthread+0x10/0x10
ret_from_fork+0x30/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Modules linked in:
CR2: 0000000000000356
---[ end trace 0000000000000000 ]---
RIP: 0010:ceph_put_snap_context+0xf/0x30
Code: 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 0f 1f 44 00 00 48 85 ff 74 12 b8 ff ff ff ff <f0> 0f
c1 07 83 f8 01 74 09 85 c0 7e 0a c3 cc cc cc cc e9 3a 62 70
RSP: 0018:ffffb9a7406cbaa8 EFLAGS: 00010206
RAX: 00000000ffffffff RBX: ffffdbdadc4465c0 RCX: ffffdbdadc446d80
RDX: 0000000000000000 RSI: ffff9a2d42b7eb18 RDI: 0000000000000356
RBP: 0000000000001000 R08: ffffffffffffffc0 R09: ffff9a2d4254fc40
R10: 0000000000000020 R11: fefefefefefefeff R12: 0000000000000356
R13: ffff9a2d42b7e688 R14: ffffffffffffffff R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff9a384eec0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000356 CR3: 000000010d31c002 CR4: 00000000001706b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
note: kworker/3:2[1925] exited with irqs disabled

The bug hunt continues.

Max