Re: [PATCH printk v1 04/18] printk: Add per-console suspended state

From: Petr Mladek
Date: Wed Mar 08 2023 - 09:42:24 EST


On Thu 2023-03-02 21:02:04, John Ogness wrote:
> Currently the global @console_suspended is used to determine if
> consoles are in a suspended state. Its primary purpose is to allow
> usage of the console_lock when suspended without causing console
> printing. It is synchronized by the console_lock.
>
> Rather than relying on the console_lock to determine suspended
> state, make it an official per-console state that is set within
> console->flags. This allows the state to be queried via SRCU.
>
> @console_suspended will continue to exist, but now only to implement
> the console_lock/console_unlock trickery and _not_ to represent
> the suspend state of a particular console.
>
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -153,6 +153,8 @@ static inline int con_debug_leave(void)
> * receiving the printk spam for obvious reasons.
> * @CON_EXTENDED: The console supports the extended output format of
> * /dev/kmesg which requires a larger output buffer.
> + * @CON_SUSPENDED: Indicates if a console is suspended. If true, the
> + * printing callbacks must not be called.
> */
> enum cons_flags {
> CON_PRINTBUFFER = BIT(0),
> @@ -162,6 +164,7 @@ enum cons_flags {
> CON_ANYTIME = BIT(4),
> CON_BRL = BIT(5),
> CON_EXTENDED = BIT(6),
> + CON_SUSPENDED = BIT(7),

We have to show it in /proc/consoles, see fs/proc/consoles.c.

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2563,10 +2563,26 @@ MODULE_PARM_DESC(console_no_auto_verbose, "Disable console loglevel raise to hig
> */
> void suspend_console(void)
> {
> + struct console *con;
> +
> if (!console_suspend_enabled)
> return;
> pr_info("Suspending console(s) (use no_console_suspend to debug)\n");
> pr_flush(1000, true);
> +
> + console_list_lock();
> + for_each_console(con)
> + console_srcu_write_flags(con, con->flags | CON_SUSPENDED);
> + console_list_unlock();
> +
> + /*
> + * Ensure that all SRCU list walks have completed. All printing
> + * contexts must be able to see that they are suspended so that it
> + * is guaranteed that all printing has stopped when this function
> + * completes.
> + */
> + synchronize_srcu(&console_srcu);
> +
> console_lock();
> console_suspended = 1;
> up_console_sem();
> @@ -2574,11 +2590,26 @@ void suspend_console(void)
>
> void resume_console(void)
> {
> + struct console *con;
> +
> if (!console_suspend_enabled)
> return;
> down_console_sem();
> console_suspended = 0;
> console_unlock();
> +
> + console_list_lock();
> + for_each_console(con)
> + console_srcu_write_flags(con, con->flags & ~CON_SUSPENDED);
> + console_list_unlock();
> +
> + /*
> + * Ensure that all SRCU list walks have completed. All printing
> + * contexts must be able to see they are no longer suspended so
> + * that they are guaranteed to wake up and resume printing.
> + */
> + synchronize_srcu(&console_srcu);
> +

The setting of the global "console_suspended" and per-console
CON_SUSPENDED flag is not synchronized. As a result, they might
become inconsistent:

CPU0 CPU1

suspend_console()
console_list_lock();
# set CON_SUSPENDED
console_list_unlock();

console_resume()
down_console_sem();
console_suspended = 0;
console_unlock();

console_list_lock()
# clear CON_SUSPENDED
console_list_unlock();

console_lock();
console_suspended = 1;
up_console_sem();

I think that we could just remove the global "console_suspended" flag.

IMHO, it used to be needed to avoid moving the global "console_seq" forward
when the consoles were suspended. But it is not needed now with the
per-console con->seq. console_flush_all() skips consoles when
console_is_usable() fails and it bails out when there is no progress.

It seems that both console_flush_all() and console_unlock() would
handle this correctly.

Hmm, it would change the behavior of console_lock() and console_trylock().
They would set "console_locked" and "console_may_schedule" even when
the consoles are suspended. But it should be OK:

+ "console_may_schedule" actually should be set according
to the context where console_unlock() will be called.

+ "console_locked" seems to be used only in WARN_CONSOLE_UNLOCKED().
I could imagine a corner case where, for example, "vt" code does
not print the warning because it works as it works. But it does
not make much sense. IMHO, such a code should get fixed. And it
is just a warning after all.

> pr_flush(1000, true);
> }
>
> @@ -3712,14 +3745,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
> }
> console_srcu_read_unlock(cookie);
>
> - /*
> - * If consoles are suspended, it cannot be expected that they
> - * make forward progress, so timeout immediately. @diff is
> - * still used to return a valid flush status.
> - */
> - if (console_suspended)
> - remaining = 0;

Heh, I though that this might cause regression, e.g. non-necessary
delays during suspend.

But it actually works because "diff" is counted only for usable
consoles. It will stay "0" if there is no usable console.

I wonder if it would make sense to add a comment somewhere,
e.g. above the later check:

+ /* diff is zero also when there is no usable console. */
if (diff == 0 || remaining == 0)
break;

Anyway, we should update the comment above pr_flush():

- * Return: true if all enabled printers are caught up.
+ * Return: true if all usable printers are caught up.

> - else if (diff != last_diff && reset_on_progress)
> + if (diff != last_diff && reset_on_progress)
> remaining = timeout_ms;
>
> console_unlock();

Best Regards,
Petr