Re: [PATCH] xfs: annotate lockless b_flags read in xfs_buf_lock

Next message: Baoquan He: "Re: [PATCH v2 0/3] mm/swap: use swap_ops to register swap device's methods"
Previous message: Ping-Ke Shih: "RE: [PATCH] wifi: rtw89: fix typo &quot;frome&quot; -&gt; &quot;from&quot; in rx_freq_frome_ie"
In reply to: Cen Zhang: "[PATCH] xfs: annotate lockless b_flags read in xfs_buf_lock"
Next in thread: Cen Zhang: "Re: [PATCH] xfs: annotate lockless b_flags read in xfs_buf_lock"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Dave Chinner

Date: Sun Mar 29 2026 - 20:52:18 EST

On Fri, Mar 27, 2026 at 09:11:52PM +0800, Cen Zhang wrote:
> xfs_buf_lock() reads bp->b_flags before acquiring the buffer semaphore
> to check whether a stale, pinned buffer needs a log force:
>
> if (atomic_read(&bp->b_pin_count) && (bp->b_flags & XBF_STALE))
>
> This races with xfs_trans_dirty_buf(), which modifies b_flags while
> the buffer is locked by a transaction on another CPU.
>
> The pre-semaphore check is a performance hint: if a stale pinned
> buffer is detected, forcing the log avoids a long wait on the
> semaphore. Either outcome of the race is benign -- a false positive
> triggers a harmless log force, and a false negative simply means the
> caller blocks on the semaphore and the log force happens later.
>
> Annotate the lockless read with READ_ONCE().

No. READ_ONCE should not be used to annotate a benign data access
race. data_race() should be used because all it does is turn off
KASAN for that access, and unlike READ_ONCE(), there is no code
change when KASAN is not enabled.

But, in reality, the race condition here is more than just the
b_flags access. There is a big comment above the function explaining what
the check does, and that the buffer pinned check that precedes the
b_flags check can race with journal completion, too. SO, if we lose
the race and trigger the log force, then nothing bad happens, and
latency is no worse than if we won the race and triggered a log
force.

IOWs, adding READ_ONCE() to these check doesn't actually "annotate"
anything useful or informative - it's not paired with WRITE_ONCE()
anywhere (nor would we want to be doing that), and so it's just a
random, uncommented macro in the code that makes things harder to
read.

-Dave.
--
Dave Chinner
dgc@xxxxxxxxxx