Re: [PATCH] ocfs2: kill osb->system_file_mutex lock
From: Tetsuo Handa
Date: Thu May 14 2026 - 03:09:42 EST
Hi Heming,
I would like to clarify why the expectation of "being called only once" is logically
incorrect, reply to your concern regarding the reference count leak and explain why
this patch is completely safe and sufficient.
1. get_local_system_inode() can fail under memory pressure:
get_local_system_inode() allocates memory internally. Under heavy memory pressure,
this allocation can fail and return NULL. When this happens, the caller
ocfs2_get_system_file_inode() must fall back to calling _ocfs2_get_system_file_inode()
again to read the inode from disk. Therefore, the filesystem design must inherently
support multiple calls to _ocfs2_get_system_file_inode().
2. Why cmpxchg() is sufficient and safe without the mutex:
The only thing the system_file_mutex is needed was to prevent a race where two
threads concurrently execute _ocfs2_get_system_file_inode(), obtain the SAME inode
pointer (since the underlying VFS iget_locked() returns the identical address for
the same slot), and both mistakenly invoke igrab() on it, leading to a reference
count leak.
This patch perfectly solves that race condition by using cmpxchg() on the target
pointer array slot:
* The thread that wins the cmpxchg() successfully initializes the slot with the
fetched inode and get the extra refcount because it is the first time to store
into the slot.
* The thread that loses the cmpxchg() detects that another thread has already
initialized the slot with the exact same inode. The loser thread returns
without getting the extra refcount because it is not the first time to store
into the slot.
Therefore, the reference counting contract is strictly and atomically maintained.
No references are leaked, and the array slot is never corrupted.
3. Standard filesystems do not use a global mutex for this:
Standard filesystems (like Ext4's ext4_get_journal_inode or XFS's
xfs_qm_init_quotainos) rely entirely on the VFS layer's internal hashing/locking (e.g.,
iget_locked) to serialize metadata/system inode lookups. OCFS2's system_file_mutex is a
redundant global lock that heavily pollutes the lock dependency graph, triggering
possible deadlock warnings that block us from testing and fixing genuine deadlocks.
Since the cmpxchg() approach guarantees atomic slot initialization + igrab(), the global
mutex is completely redundant and should be removed.
Regards.