Re: [PATCH tip/core/rcu 03/17] rcu/tree: Skip entry into the page allocator for PREEMPT_RT
From: Paul E. McKenney
Date: Tue Jun 30 2020 - 14:35:37 EST
On Tue, Jun 30, 2020 at 06:45:43PM +0200, Sebastian Andrzej Siewior wrote:
> On 2020-06-24 13:12:12 [-0700], paulmck@xxxxxxxxxx wrote:
> > From: "Joel Fernandes (Google)" <joel@xxxxxxxxxxxxxxxxx>
> >
> > To keep the kfree_rcu() code working in purely atomic sections on RT,
> > such as non-threaded IRQ handlers and raw spinlock sections, avoid
> > calling into the page allocator which uses sleeping locks on RT.
> >
> > In fact, even if the caller is preemptible, the kfree_rcu() code is
> > not, as the krcp->lock is a raw spinlock.
> >
> > Calling into the page allocator is optional and avoiding it should be
> > Ok, especially with the page pre-allocation support in future patches.
> > Such pre-allocation would further avoid the a need for a dynamically
> > allocated page in the first place.
> >
> > Cc: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> > Reviewed-by: Uladzislau Rezki <urezki@xxxxxxxxx>
> > Co-developed-by: Uladzislau Rezki <urezki@xxxxxxxxx>
> > Signed-off-by: Uladzislau Rezki <urezki@xxxxxxxxx>
> > Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx>
> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> > ---
> > kernel/rcu/tree.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 64592b4..dbdd509 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3184,6 +3184,18 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> > if (!bnode) {
> > WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
> >
> > + /*
> > + * To keep this path working on raw non-preemptible
> > + * sections, prevent the optional entry into the
> > + * allocator as it uses sleeping locks. In fact, even
> > + * if the caller of kfree_rcu() is preemptible, this
> > + * path still is not, as krcp->lock is a raw spinlock.
> > + * With additional page pre-allocation in the works,
> > + * hitting this return is going to be much less likely.
> > + */
> > + if (IS_ENABLED(CONFIG_PREEMPT_RT))
> > + return false;
>
> This is not going to work together with the "wait context validator"
> (CONFIG_PROVE_RAW_LOCK_NESTING). As of -rc3 it should complain about
> printk() which is why it is still disabled by default.
Fixing that should be "interesting". In particular, RCU CPU stall
warnings rely on the raw spin lock to reduce false positives due
to race conditions. Some thought will be required here.
> So assume that this is fixed and enabled then on !PREEMPT_RT it will
> complain that you have a raw_spinlock_t acquired (the one from patch
> 02/17) and attempt to acquire a spinlock_t in the memory allocator.
Given that the slab allocator doesn't acquire any locks until it gets
a fair way in, wouldn't it make sense to allow a "shallow" allocation
while a raw spinlock is held? This would require yet another GFP_ flag,
but that won't make all that much of a difference in the total. ;-)
Thanx, Paul
> > bnode = (struct kfree_rcu_bulk_data *)
> > __get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> > }
>
> Sebastian