Re: [GIT PULL] RCU fix

From: Paul E. McKenney
Date: Tue May 31 2011 - 14:11:25 EST


On Wed, Jun 01, 2011 at 02:52:59AM +0900, Linus Torvalds wrote:
> On Wed, Jun 1, 2011 at 2:44 AM, Paul E. McKenney
> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > The reason for the switch is to allow threads blocked in TREE_PREEMPT_RCU
> > and TINY_PREEMPT_RCU RCU read-side critical sections to have their
> > priority boosted in order to avoid OOM.  People have made these OOMs
> > happen, so this is not longer just a theoretical concern.
>
> Quite frankly, that doesn't make much sense.
>
> First off, the default for priority boosting is off (and you cannot
> even select it unless you have RT_MUTEX and PREEMPT_RCU), so why the
> heck do we still use the threads even when we don't support the
> boosting at all?

I considered using softirq in the !RCU_BOOST case, but that makes the
code larger and just makes the failure cases we saw less likely. And
some of the failure cases could be made to happen from userspace with
real-time threads, not just from RCU priority boosting.

But I could of course switch to the dual softirq/kthread approach
if needed.

> Secondly, if a process is in danger of exhausting the RCU resources,
> and it is preemptable, why doesn't the rcu_read_unlock() logic just
> try to force a reschedule and thus an rcu idle period? Using processes
> and process priorities for this seems to be just stupid.

This approach does work (and is used) for TINY_RCU and TREE_RCU,
but it unfortunately simply does not work for TINY_PREEMPT_RCU and
TREE_PREEMPT_RCU. The reason for this is that for the preemptible
variants of RCU, a reschedule in not guaranteed to be an RCU quiescent
state. Which is why RCU_BOOST depends on PREEMPT_RCU (which is either
TINY_PREEMPT_RCU or TREE_PREEMPT_RCU.

> I dunno. After RCU_TINY showed how fragile it was to use kernel
> threads for this, and after this subtle issue just re-inforced that
> conclusion, I just cannot begin to believe that using a thread was the
> right thing to do. It just seems stupid.

Again, at least some of these were things that could be made to happen
from userspace with the standard APIs, so those at least did need to
be fixed.

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/