Re: possible deadlock in ocfs2_wipe_inode
From: Joseph Qi
Date: Thu Mar 12 2026 - 04:23:51 EST
On 3/11/26 3:51 PM, Jianzhou Zhao wrote:
>
>
> Subject: [BUG] ocfs2: WARNING: possible circular locking dependency in ocfs2_evict_inode
>
> Dear Maintainers,
>
> We are writing to report a possible circular locking dependency vulnerability in the `ocfs2` subsystem, detected by the Lockdep validation mechanism as well as our custom fuzzing tool, RacePilot. The bug involves an ABBA deadlock concerning the system inode allocations, `fs_reclaim`, and `osb->nfs_sync_rwlock`. We observed this on the Linux kernel version 6.18.0-08691-g2061f18ad76e-dirty.
>
> Call Trace & Context
> ==================================================================
> WARNING: possible circular locking dependency detected
> kswapd1/95 is trying to acquire lock:
> ffff8880005889c0 (&ocfs2_sysfile_lock_key[ORPHAN_DIR_SYSTEM_INODE]){+.+.}-{4:4}, at: inode_lock include/linux/fs.h:1027 [inline]
> ffff8880005889c0 (&ocfs2_sysfile_lock_key[ORPHAN_DIR_SYSTEM_INODE]){+.+.}-{4:4}, at: ocfs2_wipe_inode+0x2df/0x1380 fs/ocfs2/inode.c:852
>
> but task is already holding lock:
> ffff8880533acbd0 (&osb->nfs_sync_rwlock){.+.+}-{4:4}, at: ocfs2_nfs_sync_lock+0xe9/0x2f0 fs/ocfs2/dlmglue.c:2875
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #3 (&osb->nfs_sync_rwlock){.+.+}-{4:4}:
> down_read+0x9b/0x460 kernel/locking/rwsem.c:1537
> ocfs2_nfs_sync_lock+0xe9/0x2f0 fs/ocfs2/dlmglue.c:2875
> ocfs2_delete_inode fs/ocfs2/inode.c:1106 [inline]
> ocfs2_evict_inode+0x2c7/0x1430 fs/ocfs2/inode.c:1297
> evict+0x3b3/0xaa0 fs/inode.c:850
> ...
> balance_pgdat+0xb75/0x1a20 mm/vmscan.c:7270
> kswapd+0x576/0xac0 mm/vmscan.c:7537
>
> -> #2 (fs_reclaim){+.+.}-{0:0}:
> __fs_reclaim_acquire mm/page_alloc.c:4264 [inline]
> fs_reclaim_acquire+0x102/0x150 mm/page_alloc.c:4278
> ...
> slab_alloc_node mm/slub.c:5234 [inline]
> kmalloc_noprof include/linux/slab.h:957 [inline]
> ocfs2_reserve_new_metadata_blocks+0xed/0xb50 fs/ocfs2/suballoc.c:968
> ocfs2_mknod+0xa65/0x24e0 fs/ocfs2/namei.c:350
> ocfs2_create+0x180/0x430 fs/ocfs2/namei.c:676
> ...
> do_sys_open fs/open.c:1436 [inline]
> __x64_sys_openat+0x13f/0x1f0 fs/open.c:1447
>
> -> #1 (&ocfs2_sysfile_lock_key[INODE_ALLOC_SYSTEM_INODE]){+.+.}-{4:4}:
> down_write+0x91/0x200 kernel/locking/rwsem.c:1590
> inode_lock include/linux/fs.h:1027 [inline]
> ocfs2_remove_inode+0x15e/0x8e0 fs/ocfs2/inode.c:731
> ocfs2_wipe_inode+0x652/0x1380 fs/ocfs2/inode.c:894
> ocfs2_delete_inode fs/ocfs2/inode.c:1155 [inline]
> ocfs2_evict_inode+0x69e/0x1430 fs/ocfs2/inode.c:1297
> ...
>
> -> #0 (&ocfs2_sysfile_lock_key[ORPHAN_DIR_SYSTEM_INODE]){+.+.}-{4:4}:
> lock_acquire+0x17b/0x330 kernel/locking/lockdep.c:5825
> down_write+0x91/0x200 kernel/locking/rwsem.c:1590
> inode_lock include/linux/fs.h:1027 [inline]
> ocfs2_wipe_inode+0x2df/0x1380 fs/ocfs2/inode.c:852
> ocfs2_delete_inode fs/ocfs2/inode.c:1155 [inline]
> ocfs2_evict_inode+0x69e/0x1430 fs/ocfs2/inode.c:1297
> ...
> kswapd+0x576/0xac0 mm/vmscan.c:7537
>
> other info that might help us debug this:
>
> Chain exists of:
> &ocfs2_sysfile_lock_key[ORPHAN_DIR_SYSTEM_INODE] --> fs_reclaim --> &osb->nfs_sync_rwlock
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> rlock(&osb->nfs_sync_rwlock);
> lock(fs_reclaim);
> lock(&osb->nfs_sync_rwlock);
> lock(&ocfs2_sysfile_lock_key[ORPHAN_DIR_SYSTEM_INODE]);
>
> *** DEADLOCK ***
> ==================================================================
>
> Execution Flow & Code Context
> The `kswapd` thread processes eviction logic for unused inodes in the dcache queue holding the `fs_reclaim` lock contexts. When a unlinked inode enters `ocfs2_evict_inode()`, it calls into `ocfs2_delete_inode()`. The deletion pathway attempts to securely clear traces of the inode globally by locking internal structure mutexes including the `nfs_sync_rwlock` and `ORPHAN_DIR_SYSTEM_INODE` lock:
> ```c
> // fs/ocfs2/inode.c
> static void ocfs2_delete_inode(struct inode *inode)
> {
> ...
> status = ocfs2_nfs_sync_lock(OCFS2_SB(inode->i_sb), 0); // <-- Takes nfs_sync_rwlock
> ...
> status = ocfs2_wipe_inode(inode, di_bh);
> ...
> }
>
> static int ocfs2_wipe_inode(struct inode *inode, struct buffer_head *di_bh)
> {
> ...
> inode_lock(orphan_dir_inode); // <-- Takes ocfs2_sysfile_lock_key[ORPHAN_DIR]
> status = ocfs2_inode_lock(orphan_dir_inode, &orphan_dir_bh, 1);
> ...
> }
> ```
> Concurrently, another thread could invoke file creation (`ocfs2_create` ... `ocfs2_reserve_new_metadata_blocks`), requiring memory. In this path, if it pauses allocation, it calls `kmalloc` invoking `fs_reclaim_acquire`, making `&ocfs2_sysfile_lock_key` -> `fs_reclaim` dependent loops since some metadata modifications naturally grasp system file cluster locks to protect alloc structures.
>
> Root Cause Analysis
> A circular locking deadlock exists because `ocfs2_evict_inode` is triggered directly under memory `fs_reclaim` shrinker pathways (e.g., from `kswapd`). When `kswapd` frees the dentry structures, `evict` is processed synchronously in the reclaim context, which in OCFS2 takes complex subsystem locks spanning DLM locks like `nfs_sync_rwlock` and various `sysfile_lock_key` variants intended to orchestrate global orphan cleanups. If any other arbitrary thread operating on creation pathways sleeps acquiring physical block pages (`fs_reclaim` needed), the paths will intertwine resulting in lock cycles.
> Unfortunately, we were unable to generate a reproducer for this bug.
>
> Potential Impact
> This circular dependency can stall the `kswapd` daemon globally halting memory management tasks, leading to system OOM (Out of Memory) panics and filesystem lock-ups. This constitutes a persistent local Denial of Service (DoS) capability triggered arbitrarily under low-free memory workloads with orphaned inodes.
>
> Proposed Fix
> To mitigate the deadlock, `ocfs2_evict_inode` must defer the heavy `ocfs2_delete_inode` operations to a worker queue outside the memory reclaim context if the current process is a memory reclaimer (such as `kswapd` signaled by `current->flags & PF_MEMALLOC`).
>
> ```diff
> --- a/fs/ocfs2/inode.c
> +++ b/fs/ocfs2/inode.c
> @@ -1292,8 +1292,16 @@ void ocfs2_evict_inode(struct inode *inode)
> write_inode_now(inode, 1);
>
> if (!inode->i_nlink ||
> (OCFS2_I(inode)->ip_flags & OCFS2_INODE_MAYBE_ORPHANED)) {
> - ocfs2_delete_inode(inode);
> + if (current->flags & PF_MEMALLOC) {
> + /*
> + * Defer deleting orphan inodes if doing memory reclaim
> + * to avoid lockdep circular dependencies.
> + */
> + ocfs2_queue_orphan_scan(OCFS2_SB(inode->i_sb));
> + } else {
> + ocfs2_delete_inode(inode);
> + }
> } else {
> truncate_inode_pages_final(&inode->i_data);
> }
> ```
>
> We would be highly honored if this could be of any help.
>
Take a look at this report, I think it could *theoreticallly* happen.
CPU0 CPU1
fs_reclaim ocfs2_reserve_suballoc_bits
ocfs2_evict_inode inode_lock(INODE_ALLOC)
down_read(nfs_sync_rwlock) ocfs2_block_group_alloc
ocfs2_wipe_inode ocfs2_reserve_clusters_with_limit
inode_lock(ORPHAN_DIR) kzalloc_obj
ocfs2_remove_inode
inode_lock(INODE_ALLOC)
Your proposed fix looks incorrect and it could be much complicated.
Maybe use memalloc_nofs_[save|restore] when allocate.
Thanks,
Joseph