Re: [PATCH 3/3] ocfs2: use READ_ONCE for lockless jinode reads

From: Jan Kara

Date: Mon Feb 02 2026 - 12:21:41 EST

On Fri 30-01-26 16:36:28, Matthew Wilcox wrote:
> On Fri, Jan 30, 2026 at 08:26:40PM +0800, Li Chen wrote:
> > Hi Matthew,
> >
> > > On Fri, Jan 30, 2026 at 11:12:32AM +0800, Li Chen wrote:
> > > > ocfs2 journal commit callback reads jbd2_inode dirty range fields without
> > > > holding journal->j_list_lock.
> > > >
> > > > Use READ_ONCE() for these reads to correct the concurrency assumptions.
> > >
> > > I don't think this is the right solution to the problem. If it is,
> > > there needs to be much better argumentation in the commit message.
> > >
> > > As I understand it, jbd2_journal_file_inode() initialises jinode,
> > > then adds it to the t_inode_list, then drops the j_list_lock. So the
> > > actual problem we need to address is that there's no memory barrier
> > > between the store to i_dirty_start and the list_add(). Once that's
> > > added, there's no need for a READ_ONCE here.
> > >
> > > Or have I misunderstood the problem?
> >
> > Thanks for the review.
> >
> > My understanding of your point is that you're worried about a missing
> > "publish" ordering in jbd2_journal_file_inode(): we store
> > jinode->i_dirty_start/end and then list_add() the jinode to
> > t_inode_list, and a core which observes the list entry might miss the prior
> > i_dirty_* stores. Is that the issue you had in mind?
>
> I think that's the only issue that exists ...
>
> > If so, for the normal commit path where the list is walked under
> > journal->j_list_lock (e.g. journal_submit_data_buffers() in
> > fs/jbd2/commit.c), spin_lock()/spin_unlock() should already provide the
> > necessary ordering, since both the i_dirty_* updates and the list_add()
> > happen inside the same critical section.
>
> I don't think that's true. I think what you're asserting is that:
>
> int *pi;
> int **ppi;
>
> spin_lock(&lock);
> *pi = 1;
> *ppi = pi;
> spin_unlock(&lock);
>
> that the store to *pi must be observed before the store to *ppi, and
> that's not true for a reader which doesn't read the value of lock.
> The store to *ppi needs a store barrier before it.

Well, the above reasonably accurately describes the code making jinode
visible. The reader code is like:

spin_lock(&lock);
pi = *ppi;
spin_unlock(&lock);

work with pi

so it is guaranteed to see pi properly initialized. The problem is that
"work with pi" can race with other code updating the content of pi which is
what this patch is trying to deal with.

> > The ocfs2 case I was aiming at is different: the filesystem callback is
> > invoked after unlocking journal->j_list_lock and may sleep, so it can't hold
> > j_list_lock but it still reads jinode->i_dirty_start/end while other
> > threads update these fields under the lock. Adding a barrier between the
> > stores and list_add() would not address that concurrent update window.
>
> I don't think that race exists. If it does exist, the READ_ONCE will
> not help (on 32 bit platforms) because it's a 64-bit quantity and 32-bit
> platforms do not, in general, have a way to do an atomic 64-bit load
> (look at the implementation of i_size_read() for the gyrations we go
> through to assure a non-torn read of that value).

Sadly the race does exist - journal_submit_data_buffers() on the committing
transaction can run in parallel with jbd2_journal_file_inode() in the
running transaction. There's nothing preventing that. The problems arising
out of that are mostly theoretical but they do exist. In particular you're
correct that on 32-bit platforms this will be racy even with READ_ONCE /
WRITE_ONCE which I didn't realize.

Li, the best way to address this concern would be to modify jbd2_inode to
switch i_dirty_start / i_dirty_end to account in PAGE_SIZE units instead of
bytes and be of type pgoff_t. jbd2_journal_file_inode() just needs to round
the passed ranges properly...

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR