[tip: sched/rt] printk: nbcon: Clarify rules of the owner/waiter matching

From: tip-bot2 for John Ogness
Date: Mon Sep 09 2024 - 13:36:55 EST


The following commit has been merged into the sched/rt branch of tip:

Commit-ID: 8c9dab2c55ad74680728c72949971b20d70b56ca
Gitweb: https://git.kernel.org/tip/8c9dab2c55ad74680728c72949971b20d70b56ca
Author: John Ogness <john.ogness@xxxxxxxxxxxxx>
AuthorDate: Tue, 20 Aug 2024 08:35:31 +02:06
Committer: Petr Mladek <pmladek@xxxxxxxx>
CommitterDate: Wed, 21 Aug 2024 14:56:22 +02:00

printk: nbcon: Clarify rules of the owner/waiter matching

The functions nbcon_owner_matches() and nbcon_waiter_matches()
use a minimal set of data to determine if a context matches.
The existing kerneldoc and comments were not clear enough and
caused the printk folks to re-prove that the functions are
indeed reliable in all cases.

Update and expand the explanations so that it is clear that the
implementations are sufficient for all cases.

Signed-off-by: John Ogness <john.ogness@xxxxxxxxxxxxx>
Reviewed-by: Petr Mladek <pmladek@xxxxxxxx>
Link: https://lore.kernel.org/r/20240820063001.36405-6-john.ogness@xxxxxxxxxxxxx
Signed-off-by: Petr Mladek <pmladek@xxxxxxxx>
---
kernel/printk/nbcon.c | 56 ++++++++++++++++++++++++++++++++++--------
1 file changed, 46 insertions(+), 10 deletions(-)

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 776746d..931b8f0 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -228,6 +228,13 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
struct nbcon_state new;

do {
+ /*
+ * Panic does not imply that the console is owned. However, it
+ * is critical that non-panic CPUs during panic are unable to
+ * acquire ownership in order to satisfy the assumptions of
+ * nbcon_waiter_matches(). In particular, the assumption that
+ * lower priorities are ignored during panic.
+ */
if (other_cpu_in_panic())
return -EPERM;

@@ -259,18 +266,29 @@ static bool nbcon_waiter_matches(struct nbcon_state *cur, int expected_prio)
/*
* The request context is well defined by the @req_prio because:
*
- * - Only a context with a higher priority can take over the request.
+ * - Only a context with a priority higher than the owner can become
+ * a waiter.
+ * - Only a context with a priority higher than the waiter can
+ * directly take over the request.
* - There are only three priorities.
* - Only one CPU is allowed to request PANIC priority.
* - Lower priorities are ignored during panic() until reboot.
*
* As a result, the following scenario is *not* possible:
*
- * 1. Another context with a higher priority directly takes ownership.
- * 2. The higher priority context releases the ownership.
- * 3. A lower priority context takes the ownership.
- * 4. Another context with the same priority as this context
+ * 1. This context is currently a waiter.
+ * 2. Another context with a higher priority than this context
+ * directly takes ownership.
+ * 3. The higher priority context releases the ownership.
+ * 4. Another lower priority context takes the ownership.
+ * 5. Another context with the same priority as this context
* creates a request and starts waiting.
+ *
+ * Event #1 implies this context is EMERGENCY.
+ * Event #2 implies the new context is PANIC.
+ * Event #3 occurs when panic() has flushed the console.
+ * Events #4 and #5 are not possible due to the other_cpu_in_panic()
+ * check in nbcon_context_try_acquire_direct().
*/

return (cur->req_prio == expected_prio);
@@ -578,11 +596,29 @@ static bool nbcon_owner_matches(struct nbcon_state *cur, int expected_cpu,
int expected_prio)
{
/*
- * Since consoles can only be acquired by higher priorities,
- * owning contexts are uniquely identified by @prio. However,
- * since contexts can unexpectedly lose ownership, it is
- * possible that later another owner appears with the same
- * priority. For this reason @cpu is also needed.
+ * A similar function, nbcon_waiter_matches(), only deals with
+ * EMERGENCY and PANIC priorities. However, this function must also
+ * deal with the NORMAL priority, which requires additional checks
+ * and constraints.
+ *
+ * For the case where preemption and interrupts are disabled, it is
+ * enough to also verify that the owning CPU has not changed.
+ *
+ * For the case where preemption or interrupts are enabled, an
+ * external synchronization method *must* be used. In particular,
+ * the driver-specific locking mechanism used in device_lock()
+ * (including disabling migration) should be used. It prevents
+ * scenarios such as:
+ *
+ * 1. [Task A] owns a context with NBCON_PRIO_NORMAL on [CPU X] and
+ * is scheduled out.
+ * 2. Another context takes over the lock with NBCON_PRIO_EMERGENCY
+ * and releases it.
+ * 3. [Task B] acquires a context with NBCON_PRIO_NORMAL on [CPU X]
+ * and is scheduled out.
+ * 4. [Task A] gets running on [CPU X] and sees that the console is
+ * still owned by a task on [CPU X] with NBON_PRIO_NORMAL. Thus
+ * [Task A] thinks it is the owner when it is not.
*/

if (cur->prio != expected_prio)