Re: [RFC PATCH 4/8] rcu/nocb: Trigger self-IPI on late deferred wake up before user resume

From: Peter Zijlstra
Date: Mon Jan 11 2021 - 07:05:25 EST


On Sat, Jan 09, 2021 at 03:05:32AM +0100, Frederic Weisbecker wrote:
> Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
> kthread (rcuog) to be serviced.
>
> Unfortunately the call to rcu_user_enter() is already past the last
> rescheduling opportunity before we resume to userspace or to guest mode.
> We may escape there with the woken task ignored.
>
> The ultimate resort to fix every callsites is to trigger a self-IPI
> (nohz_full depends on IRQ_WORK) that will trigger a reschedule on IRQ
> tail or guest exit.
>
> Eventually every site that want a saner treatment will need to carefully
> place a call to rcu_nocb_flush_deferred_wakeup() before the last explicit
> need_resched() check upon resume.
>
> Reported-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> Fixes: 96d3fd0d315a (rcu: Break call_rcu() deadlock involving scheduler and perf)
> Cc: stable@xxxxxxxxxxxxxxx
> Cc: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Ingo Molnar<mingo@xxxxxxxxxx>
> Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
> ---
> kernel/rcu/tree.c | 22 +++++++++++++++++++++-
> kernel/rcu/tree.h | 2 +-
> kernel/rcu/tree_plugin.h | 25 ++++++++++++++++---------
> 3 files changed, 38 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index b6e1377774e3..2920dfc9f58c 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -676,6 +676,18 @@ void rcu_idle_enter(void)
> EXPORT_SYMBOL_GPL(rcu_idle_enter);
>
> #ifdef CONFIG_NO_HZ_FULL
> +
> +/*
> + * An empty function that will trigger a reschedule on
> + * IRQ tail once IRQs get re-enabled on userspace resume.
> + */
> +static void late_wakeup_func(struct irq_work *work)
> +{
> +}
> +
> +static DEFINE_PER_CPU(struct irq_work, late_wakeup_work) =
> + IRQ_WORK_INIT(late_wakeup_func);
> +
> /**
> * rcu_user_enter - inform RCU that we are resuming userspace.
> *
> @@ -692,9 +704,17 @@ noinstr void rcu_user_enter(void)
> struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
>
> lockdep_assert_irqs_disabled();
> - do_nocb_deferred_wakeup(rdp);
> + /*
> + * We may be past the last rescheduling opportunity in the entry code.
> + * Trigger a self IPI that will fire and reschedule once we resume to
> + * user/guest mode.
> + */
> + if (do_nocb_deferred_wakeup(rdp) && need_resched())
> + irq_work_queue(this_cpu_ptr(&late_wakeup_work));
> +
> rcu_eqs_enter(true);
> }

Do we have the guarantee that every architecture that supports NOHZ_FULL
has arch_irq_work_raise() on?

Also, can't you do the same thing you did earlier and do that wakeup
thing before we complete exit_to_user_mode_prepare() ?