[PATCH] rcuwait: do not enter RCU protection unless a wakeup is needed

From: Paolo Bonzini
Date: Wed Oct 20 2021 - 07:06:59 EST


In some cases, rcuwait_wake_up can be called even if the actual likelihood
of a wakeup is very low. If CONFIG_PREEMPT_RCU is active, the resulting
rcu_read_lock/rcu_read_unlock pair can be relatively expensive, and in
fact it is unnecessary when there is no w->task to keep alive: the
memory barrier before the read is what matters in order to avoid missed
wakeups.

Therefore, do an early check of w->task right after the barrier, and skip
rcu_read_lock/rcu_read_unlock unless there is someone waiting for a wakeup.

Running kvm-unit-test/vmexit.flat with APICv disabled, most interrupt
injection tests (tscdeadline*, self_ipi*, x2apic_self_ipi*) improve
by around 600 cpu cycles.

Cc: Davidlohr Bueso <dave@xxxxxxxxxxxx>
Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Reported-by: Wanpeng Li <wanpengli@xxxxxxxxxxx>
Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
---
kernel/exit.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 91a43e57a32e..a38a08dbf85e 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -234,8 +234,6 @@ int rcuwait_wake_up(struct rcuwait *w)
int ret = 0;
struct task_struct *task;

- rcu_read_lock();
-
/*
* Order condition vs @task, such that everything prior to the load
* of @task is visible. This is the condition as to why the user called
@@ -245,10 +243,22 @@ int rcuwait_wake_up(struct rcuwait *w)
* WAIT WAKE
* [S] tsk = current [S] cond = true
* MB (A) MB (B)
- * [L] cond [L] tsk
+ * [L] cond [L] rcuwait_active(w)
+ * task = rcu_dereference(w->task)
*/
smp_mb(); /* (B) */

+#ifdef CONFIG_PREEMPT_RCU
+ /*
+ * The cost of rcu_read_lock() dominates for preemptible RCU,
+ * avoid it if possible.
+ */
+ if (!rcuwait_active(w))
+ return ret;
+#endif
+
+ rcu_read_lock();
+
task = rcu_dereference(w->task);
if (task)
ret = wake_up_process(task);
--
2.27.0