Re: [RFC PATCH 48/86] rcu: handle quiescent states for PREEMPT_RCU=n
From: Ankur Arora
Date: Wed Dec 06 2023 - 20:33:57 EST
Paul E. McKenney <paulmck@xxxxxxxxxx> writes:
> On Tue, Nov 28, 2023 at 06:04:33PM +0100, Thomas Gleixner wrote:
>> Paul!
>>
>> On Mon, Nov 20 2023 at 16:38, Paul E. McKenney wrote:
>> > But...
>> >
>> > Suppose we have a long-running loop in the kernel that regularly
>> > enables preemption, but only momentarily. Then the added
>> > rcu_flavor_sched_clock_irq() check would almost always fail, making
>> > for extremely long grace periods. Or did I miss a change that causes
>> > preempt_enable() to help RCU out?
>>
>> So first of all this is not any different from today and even with
>> RCU_PREEMPT=y a tight loop:
>>
>> do {
>> preempt_disable();
>> do_stuff();
>> preempt_enable();
>> }
>>
>> will not allow rcu_flavor_sched_clock_irq() to detect QS reliably. All
>> it can do is to force reschedule/preemption after some time, which in
>> turn ends up in a QS.
>
> True, but we don't run RCU_PREEMPT=y on the fleet. So although this
> argument should offer comfort to those who would like to switch from
> forced preemption to lazy preemption, it doesn't help for those of us
> running NONE/VOLUNTARY.
>
> I can of course compensate if need be by making RCU more aggressive with
> the resched_cpu() hammer, which includes an IPI. For non-nohz_full CPUs,
> it currently waits halfway to the stall-warning timeout.
>
>> The current NONE/VOLUNTARY models, which imply RCU_PRREMPT=n cannot do
>> that at all because the preempt_enable() is a NOOP and there is no
>> preemption point at return from interrupt to kernel.
>>
>> do {
>> do_stuff();
>> }
>>
>> So the only thing which makes that "work" is slapping a cond_resched()
>> into the loop:
>>
>> do {
>> do_stuff();
>> cond_resched();
>> }
>
> Yes, exactly.
>
>> But the whole concept behind LAZY is that the loop will always be:
>>
>> do {
>> preempt_disable();
>> do_stuff();
>> preempt_enable();
>> }
>>
>> and the preempt_enable() will always be a functional preemption point.
>
> Understood. And if preempt_enable() can interact with RCU when requested,
> I would expect that this could make quite a few calls to cond_resched()
> provably unnecessary. There was some discussion of this:
>
> https://lore.kernel.org/all/0d6a8e80-c89b-4ded-8de1-8c946874f787@paulmck-laptop/
>
> There were objections to an earlier version. Is this version OK?
Copying that version here for discussion purposes:
#define preempt_enable() \
do { \
barrier(); \
if (unlikely(preempt_count_dec_and_test())) \
__preempt_schedule(); \
else if (!IS_ENABLED(CONFIG_PREEMPT_RCU) && \
(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK | HARDIRQ_MASK | NMI_MASK) == PREEMPT_OFFSET) && \
!irqs_disabled()) \
) \
rcu_all_qs(); \
} while (0)
(sched_feat is not exposed outside the scheduler so I'm using the
!CONFIG_PREEMPT_RCU version here.)
I have two-fold objections to this: as PeterZ pointed out, this is
quite a bit heavier than the fairly minimal preempt_enable() -- both
conceptually where the preemption logic now needs to know about when
to check for a specific RCU quiescience state, and in terms of code
size (seems to add about a cacheline worth) to every preempt_enable()
site.
If we end up needing this, is it valid to just optimistically check if
a quiescent state needs to be registered (see below)?
Though this version exposes rcu_data.rcu_urgent_qs outside RCU but maybe
we can encapsulate that in linux/rcupdate.h.
For V1 will go with this simple check in rcu_flavor_sched_clock_irq()
and see where that gets us:
> if (this_cpu_read(rcu_data.rcu_urgent_qs))
> set_need_resched();
---
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 9aa6358a1a16..d8139cda8814 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -226,9 +226,11 @@ do { \
#ifdef CONFIG_PREEMPTION
#define preempt_enable() \
do { \
barrier(); \
if (unlikely(preempt_count_dec_and_test())) \
__preempt_schedule(); \
+ else if (unlikely(raw_cpu_read(rcu_data.rcu_urgent_qs))) \
+ rcu_all_qs_check();
} while (0)
#define preempt_enable_notrace() \
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 41021080ad25..2ba2743d7ba3 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -887,6 +887,17 @@ void rcu_all_qs(void)
}
EXPORT_SYMBOL_GPL(rcu_all_qs);
+void rcu_all_qs_check(void)
+{
+ if (((preempt_count() &
+ (PREEMPT_MASK | SOFTIRQ_MASK | HARDIRQ_MASK | NMI_MASK)) == PREEMPT_OFFSET) && \
+ !irqs_disabled())
+
+ rcu_all_qs();
+}
+EXPORT_SYMBOL_GP(rcu_all_qs);
+
+
/*
* Note a PREEMPTION=n context switch. The caller must have disabled interrupts.
*/
--
ankur