[tip: locking/core] sched/wakeup: Introduce the TASK_RTLOCK_WAIT state bit

From: tip-bot2 for Thomas Gleixner
Date: Tue Aug 17 2021 - 16:17:28 EST


The following commit has been merged into the locking/core branch of tip:

Commit-ID: cd781d0ce8cb4d491910833c5eec90f150432da3
Gitweb: https://git.kernel.org/tip/cd781d0ce8cb4d491910833c5eec90f150432da3
Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
AuthorDate: Sun, 15 Aug 2021 23:27:41 +02:00
Committer: Ingo Molnar <mingo@xxxxxxxxxx>
CommitterDate: Tue, 17 Aug 2021 16:45:15 +02:00

sched/wakeup: Introduce the TASK_RTLOCK_WAIT state bit

RT kernels have an extra quirk for try_to_wake_up() to handle task state
preservation across periods of blocking on a 'sleeping' spin/rwlock.

For this to function correctly and under all circumstances try_to_wake_up()
must be able to identify whether the wakeup is lock related or not and
whether the task is waiting for a lock or not.

The original approach was to use a special wake_flag argument for
try_to_wake_up() and just use TASK_UNINTERRUPTIBLE for the tasks wait state
and the try_to_wake_up() state argument.

This works in principle, but due to the fact that try_to_wake_up() cannot
determine whether the task is waiting for an RT lock wakeup or for a regular
wakeup it's suboptimal.

RT kernels save the original task state when blocking on an RT lock and
restore it when the lock has been acquired. Any non lock related wakeup is
checked against the saved state and if it matches the saved state is set to
running so that the wakeup is not lost when the state is restored.

While the necessary logic for the wake_flag based solution is trivial, the
downside is that any regular wakeup with TASK_UNINTERRUPTIBLE in the state
argument set will wake the task despite the fact that it is still blocked
on the lock. That's not a fatal problem as the lock wait has do deal with
spurious wakeups anyway, but it introduces unnecessary latencies.

Introduce the TASK_RTLOCK_WAIT state bit which will be set when a task
blocks on an RT lock.

The lock wakeup will use wake_up_state(TASK_RTLOCK_WAIT), so both the
waiting state and the wakeup state are distinguishable, which avoids
spurious wakeups and allows better analysis.

Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
Link: https://lore.kernel.org/r/20210815211302.144989915@xxxxxxxxxxxxx
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
include/linux/sched.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ec8d07d..9a9f606 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -95,7 +95,9 @@ struct task_group;
#define TASK_WAKING 0x0200
#define TASK_NOLOAD 0x0400
#define TASK_NEW 0x0800
-#define TASK_STATE_MAX 0x1000
+/* RT specific auxilliary flag to mark RT lock waiters */
+#define TASK_RTLOCK_WAIT 0x1000
+#define TASK_STATE_MAX 0x2000

/* Convenience macros for the sake of set_current_state: */
#define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)