Re: [GIT PULL] RCU changes for v5.10

From: Paul E. McKenney
Date: Mon Oct 12 2020 - 22:47:34 EST


On Mon, Oct 12, 2020 at 05:14:42PM -0700, Linus Torvalds wrote:
> On Mon, Oct 12, 2020 at 4:54 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >
> > In CONFIG_PREEMPT_NONE=y kernels, RCU has no way to tell whether or
> > not its caller holds a raw spinlock, which some callers do.
>
> Only kfree_rcu()? (And apparently eventually call_rcu())?

Yes. The other RCU APIs either only use raw spinlocks themselves on
the one hand or must be called from schedulable contexts on the other.

> And since we have lockdep, and it warns about it, and raw spinlocks
> are really really rare, do we really need to then disable this simple
> optimization for everybody else?
>
> We have been very successful with "don't do that then" rules.
>
> Eg, you cannot do normal memory allocations inside a spinlock (or you
> have to mark them with GFP_ATOMIC, and not all allocations can be
> marked as such), and this has been the case basically forever. And we
> have debug code and tools that will check that.
>
> Why is it impossible to just say "you can't do kfree_rcu() while
> holding a raw spinlock"?
>
> Particularly for something like kfree_rcu() and particularly if it's
> just about raw spinlocks, it would seem to be very natural to just say
> "just delay freeing it until after you've released the raw spinlock".
>
> Because I sure hope that you don't find raw spinlock users in random
> places. It should be stuff like core scheduling, RCU itself, etc.

True enough, but core stuff does use RCU, some of it while holding
raw spinlocks.

And you are right that "just don't do that, defer it instead" is often
very effective. In fact, I defer wakeups within RCU in order to avoid
deadlocks with the scheduler. It is simple in concept, and it does
work, but it is also a disproportionate source of bugs. Most of which
rcutorture finds in the safety and comfort of my own system, thankfully,
but some do escape. Maybe I am overreacting, but I have been burned
often enough that I feel the need to avoid this.

Plus I did oversimplify. CONFIG_PREEMPT_COUNT also allows the call_rcu()
portion to avoid deadlocks with the current non-lockless memory allocator.

So if the answer from you on global CONFIG_PREEMPT_COUNT=y and from
the MM guys on lockless allocation is irrevocably "@#$@#$ NO!" or the
current-day bowdlerized equivalent, I will take the scheduling delays
in the shorts and defer allocation.

> > Making CONFIG_PREEMPT_COUNT unconditional allows
> > RCU to make this determination.
>
> I understand _that_ part, but the thing I find objectionable is how a
> small piece of code seems to want to change the rules we have had in
> the kernel since basically day #1.
>
> (Ok, so the preempt count itself is much more recent than "day #1",
> but the basic non-counting spinlocks do go back to very early in the
> SMP stages).

Understood, a count-free CONFIG_PREEMPT_NONE has been in place in the
Linux kernel for an extremely long time. And I also understand that
adding CONFIG_PREEMPT_COUNT=y everywhere is a pervasive change that is
not to be taken lightly.

Thanx, Paul