Re: [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU

From: Jan Kara
Date: Thu Feb 10 2022 - 14:13:14 EST

Next message: Paul E. McKenney: "Re: [RFC 00/12] locking: Separate lock tracepoints from lockdep/lock_stat (v1)"
Previous message: Parav Pandit: "RE: [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe"
In reply to: Paul E. McKenney: "Re: [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU"
Next in thread: Paul E. McKenney: "Re: [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu 10-02-22 06:57:13, Paul E. McKenney wrote:
> On Thu, Feb 10, 2022 at 10:16:48AM +0100, Jan Kara wrote:
> > On Wed 09-02-22 12:11:37, Paul E. McKenney wrote:
> > > On Wed, Feb 09, 2022 at 07:10:10PM +0100, Jan Kara wrote:
> > > > On Wed 09-02-22 11:57:42, Qian Cai wrote:
> > > > > Since the linux-next commit 120aa5e57479 (mm: Check for
> > > > > SLAB_TYPESAFE_BY_RCU and __GFP_ZERO slab allocation), we will get a
> > > > > boot warning. Avoid it by calling synchronize_rcu() before the zeroing.
> > > > >
> > > > > Signed-off-by: Qian Cai <quic_qiancai@xxxxxxxxxxx>
> > > >
> > > > No, the performance impact of this would be just horrible. Can you
> > > > ellaborate a bit why SLAB_TYPESAFE_BY_RCU + __GFP_ZERO is a problem and why
> > > > synchronize_rcu() would be needed here before the memset() please? I mean
> > > > how is zeroing here any different from the memory just being used?
> > >
> > > Suppose a reader picks up a pointer to a memory block, then that memory
> > > is freed. No problem, given that this is a SLAB_TYPESAFE_BY_RCU slab,
> > > so the memory won't be freed while the reader is accessing it. But while
> > > the reader is in the process of validating the block, it is zeroed.
> > >
> > > How does the validation step handle this in all cases?
> > >
> > > If you have a way of handling this, I will of course drop the patch.
> > > And learn something new, which is always a good thing. ;-)
> >
> > So I maybe missed something when implementing the usage of journal_heads
> > under RCU but let's have a look. An example of RCU user of journal heads
> > is fs/jbd2/transaction.c:jbd2_write_access_granted(). It does:
> >
> > rcu_read_lock();
> >
> > // This part fetches journal_head from buffer_head - not related to
> > // our slab RCU discussion
> >
> > if (!buffer_jbd(bh))
> > goto out;
> > /* This should be bh2jh() but that doesn't work with inline functions */
> > jh = READ_ONCE(bh->b_private);
> > if (!jh)
> > goto out;
> >
> > // The validation comes here
> >
> > /* For undo access buffer must have data copied */
> > if (undo && !jh->b_committed_data)
> > goto out;
>
> OK, so if *jh was freed and re-zallocated in the meantime, this test
> should fail. One concern would be if the zeroing was not at least eight
> bytes at a time, maybe due to overly eager use of fancy SIMD hardware.
> Though perhaps you also do something about ->b_committed_data on
> the free path, the commit-done path, or whatever? (I do see a
> "jh->b_committed_data = NULL" on what might well be the commit-done path.)
>
> > if (READ_ONCE(jh->b_transaction) != handle->h_transaction &&
> > READ_ONCE(jh->b_next_transaction) != handle->h_transaction)
> > goto out;
>
> And same with these guys.

Yes, on commit-done path we zero out jh->b_transaction (or set
jh->b_transaction = jh->b_next_transaction; jh->b_next_transaction = NULL).
So these fields are actually guaranteed to be zero on free.

> > // Then some more checks unrelated to the slab itself.
> >
> > /*
> > * There are two reasons for the barrier here:
> > * 1) Make sure to fetch b_bh after we did previous checks so that we
> > * detect when jh went through free, realloc, attach to transaction
> > * while we were checking. Paired with implicit barrier in that path.
> > * 2) So that access to bh done after jbd2_write_access_granted()
> > * doesn't get reordered and see inconsistent state of concurrent
> > * do_get_write_access().
> > */
> > smp_mb();
> > if (unlikely(jh->b_bh != bh))
> > goto out;
> >
> > // If all passed
> >
> > rcu_read_unlock();
> > return true;
> >
> > So if we are going to return true from the function, we know that 'jh' was
> > attached to handle->h_transaction at some point. And when 'jh' was attached
> > to handle->h_transaction, the transaction was holding reference to the 'jh'
> > and our 'handle' holds reference to the transaction so 'jh' could not be
> > freed since that moment. I.e., we are sure our reference to the handle keeps
> > 'jh' alive and we can safely use it.
> >
> > I don't see how any amount of scribbling over 'jh' could break this
> > validation. But maybe it is just a lack of my imagination :).
>
> Regardless of whether you are suffering a lack of imagination, you
> have clearly demonstrated that it is possible to correctly use the
> SLAB_TYPESAFE_BY_RCU flag in conjunction with kmem_cache_alloc(), thus
> demonstrating that I was suffering from a lack of imagination. ;-)
>
> I have therefore reverted my commit. Please accept my apologies for
> the hassle!

No problem. Thanks for reverting the patch. I can imagine that jbd2's use
of SLAB_TYPESAFE_BY_RCU is an unusual one...

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR

Next message: Paul E. McKenney: "Re: [RFC 00/12] locking: Separate lock tracepoints from lockdep/lock_stat (v1)"
Previous message: Parav Pandit: "RE: [PATCH net-next 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe"
In reply to: Paul E. McKenney: "Re: [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU"
Next in thread: Paul E. McKenney: "Re: [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]