Re: [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU

From: Jan Kara
Date: Thu Feb 10 2022 - 04:16:59 EST


On Wed 09-02-22 12:11:37, Paul E. McKenney wrote:
> On Wed, Feb 09, 2022 at 07:10:10PM +0100, Jan Kara wrote:
> > On Wed 09-02-22 11:57:42, Qian Cai wrote:
> > > Since the linux-next commit 120aa5e57479 (mm: Check for
> > > SLAB_TYPESAFE_BY_RCU and __GFP_ZERO slab allocation), we will get a
> > > boot warning. Avoid it by calling synchronize_rcu() before the zeroing.
> > >
> > > Signed-off-by: Qian Cai <quic_qiancai@xxxxxxxxxxx>
> >
> > No, the performance impact of this would be just horrible. Can you
> > ellaborate a bit why SLAB_TYPESAFE_BY_RCU + __GFP_ZERO is a problem and why
> > synchronize_rcu() would be needed here before the memset() please? I mean
> > how is zeroing here any different from the memory just being used?
>
> Suppose a reader picks up a pointer to a memory block, then that memory
> is freed. No problem, given that this is a SLAB_TYPESAFE_BY_RCU slab,
> so the memory won't be freed while the reader is accessing it. But while
> the reader is in the process of validating the block, it is zeroed.
>
> How does the validation step handle this in all cases?
>
> If you have a way of handling this, I will of course drop the patch.
> And learn something new, which is always a good thing. ;-)

So I maybe missed something when implementing the usage of journal_heads
under RCU but let's have a look. An example of RCU user of journal heads
is fs/jbd2/transaction.c:jbd2_write_access_granted(). It does:

rcu_read_lock();

// This part fetches journal_head from buffer_head - not related to
// our slab RCU discussion

if (!buffer_jbd(bh))
goto out;
/* This should be bh2jh() but that doesn't work with inline functions */
jh = READ_ONCE(bh->b_private);
if (!jh)
goto out;

// The validation comes here

/* For undo access buffer must have data copied */
if (undo && !jh->b_committed_data)
goto out;
if (READ_ONCE(jh->b_transaction) != handle->h_transaction &&
READ_ONCE(jh->b_next_transaction) != handle->h_transaction)
goto out;

// Then some more checks unrelated to the slab itself.

/*
* There are two reasons for the barrier here:
* 1) Make sure to fetch b_bh after we did previous checks so that we
* detect when jh went through free, realloc, attach to transaction
* while we were checking. Paired with implicit barrier in that path.
* 2) So that access to bh done after jbd2_write_access_granted()
* doesn't get reordered and see inconsistent state of concurrent
* do_get_write_access().
*/
smp_mb();
if (unlikely(jh->b_bh != bh))
goto out;

// If all passed

rcu_read_unlock();
return true;

So if we are going to return true from the function, we know that 'jh' was
attached to handle->h_transaction at some point. And when 'jh' was attached
to handle->h_transaction, the transaction was holding reference to the 'jh'
and our 'handle' holds reference to the transaction so 'jh' could not be
freed since that moment. I.e., we are sure our reference to the handle keeps
'jh' alive and we can safely use it.

I don't see how any amount of scribbling over 'jh' could break this
validation. But maybe it is just a lack of my imagination :).

Honza

--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR