Re: [PATCH tip/core/rcu 03/17] rcu/tree: Skip entry into the page allocator for PREEMPT_RT
From: Paul E. McKenney
Date: Thu Jul 02 2020 - 12:48:30 EST
On Thu, Jul 02, 2020 at 04:12:16PM +0200, Sebastian Andrzej Siewior wrote:
> On 2020-06-30 11:35:34 [-0700], Paul E. McKenney wrote:
> > > This is not going to work together with the "wait context validator"
> > > (CONFIG_PROVE_RAW_LOCK_NESTING). As of -rc3 it should complain about
> > > printk() which is why it is still disabled by default.
> >
> > Fixing that should be "interesting". In particular, RCU CPU stall
> > warnings rely on the raw spin lock to reduce false positives due
> > to race conditions. Some thought will be required here.
>
> I don't get this part. Can you explain/give me an example where to look
> at?
Starting from the scheduler-clock interrupt's call into RCU,
we have rcu_sched_clock_irq() which calls rcu_pending() which
calls check_cpu_stall() which calls either print_cpu_stall() or
print_other_cpu_stall(), depending on whether the stall is happening on
the current CPU or on some other CPU, respectively.
Both of these last functions acquire the rcu_node structure's raw ->lock
and expect to do printk()s while holding it.
Not that the lock matters all that much given that all of this code is
in hardirq context. Which is necessary if RCU CPU stall warnings are
to be at all reliable. Yeah, yeah, they would be even more reliable if
invoked in NMI context but that would prevent them from acquiring the
->lock that is used to resolve races between multiple concurrent stall
warnings and between a stall warning and the end of that stall. ;-)
I could imagine accumulating the data in hardirq context and then
using a workqueue or some such to print it all out, but that
requires that workqueues be fully operational for RCU CPU stall
warnings to appear. :-(
Thoughts?
> > > So assume that this is fixed and enabled then on !PREEMPT_RT it will
> > > complain that you have a raw_spinlock_t acquired (the one from patch
> > > 02/17) and attempt to acquire a spinlock_t in the memory allocator.
> >
> > Given that the slab allocator doesn't acquire any locks until it gets
> > a fair way in, wouldn't it make sense to allow a "shallow" allocation
> > while a raw spinlock is held? This would require yet another GFP_ flag,
> > but that won't make all that much of a difference in the total. ;-)
>
> That would be one way of dealing with. But we could go back to
> spinlock_t and keep the memory allocation even for RT as is. I don't see
> a downside of this. And we would worry about kfree_rcu() from real
> IRQ-off region once we get to it.
Once we get to it, your thought would be to do per-CPU queuing of
memory from IRQ-off kfree_rcu(), and have IRQ work or some such clean
up after it? Or did you have some other trick in mind?
Thanx, Paul
> > > > bnode = (struct kfree_rcu_bulk_data *)
> > > > __get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> > > > }
>
> Sebastian