Re: [BUG] 8e274732115f ("printk: extend console_lock for per-console locking")

From: Paul E. McKenney
Date: Mon Jun 13 2022 - 14:51:45 EST


On Mon, Jun 13, 2022 at 05:07:47PM +0200, Petr Mladek wrote:
> On Mon 2022-06-13 12:32:08, Petr Mladek wrote:
> > On Mon 2022-06-13 11:10:19, John Ogness wrote:
> > > On 2022-06-12, "Paul E. McKenney" <paulmck@xxxxxxxxxx> wrote:
> > > >> As I suspected, the final printk's cannot direct print because the
> > > >> kthread was printing. Using the below patch did seem to address your
> > > >> problem. But this is probably not the way forward.
> > > >
> > > > When I apply it, I still lose output, perhaps due to different timing?
> > > > Doing the pr_flush(1000, true) just before the call to kernel_power_off()
> > > > has been working quite well thus far, though.
> > >
> > > @Petr, I like the idea of the kthreads getting out of the way rather
> > > than trying to direct print themselves (for this situation). It still
> > > isn't optimal because that final pr_emerg("Power down\n") might come
> > > before the kthread has finished its current line. But in that case the
> > > kthread may not have much a chance to finish the printing anyway.
> >
> > I wonder if we could somehow combine it with pr_flush(timeout).
> >
> > The kthread might bail-out when pr_flush() is running. It will
> > know that someone would continue printing. The timeout might
> > help to avoid a deadlock. We could somehow reuse
> > console_trylock_spinning() code here.
> > >
> > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > > index ea3dd55709e7..45c6c2b0b104 100644
> > > --- a/kernel/printk/printk.c
> > > +++ b/kernel/printk/printk.c
> > > @@ -3729,7 +3729,9 @@ static bool printer_should_wake(struct console *con, u64 seq)
> > > return true;
> > >
> > > if (con->blocked ||
> > > - console_kthreads_atomically_blocked()) {
> > > + console_kthreads_atomically_blocked() ||
> > > + system_state > SYSTEM_RUNNING ||
> > > + oops_in_progress) {
> > > return false;
> > > }
> >
> > Also this is an interesting idea. We know that panic() will try
> > to flush the messages. Well, panic() is not always called
> > after Oops.
>
> I tried to combine the various approaches and findings into
> the following patch.
>
> It introduces try_block_console_kthread() function. It tries
> to summon console_lock() so that it is later available for
> printk().
>
> The function causes that console kthreads will not longer
> handle the messages. Also it waits until all kthreads leave
> the critical section and allow to take the global console lock.
>
> It can be safely called in atomic context because it uses
> busy wait by udelay(). Infinite waiting is prevented by
> timeout currectly hardcoded to 10s.
>
> The function is used in panic() and kernel_shutdown_prepare()
> when we know that the system is going down and kthreads will
> not be usable any longer.
>
>
> Problems:
>
> + Just the best effort.
> + Does not help on single CPU system.
>
>
> Testing:
>
> I tried Paul's RCU torture test few times. It always failed without
> this patch and always succeded with this patch. If I got the resutls
> corretly.
>
>
> Possible improvements and alternative approaches:
>
> + Prevent console kthreads from sleeping in critical
> section. Replace mutex with a spinlock?
>
> + Allow to steal con->lock similar way like the global
> console_sem.
>
> + Modify console kthreads process priority to make them
> scheduled immediately on another CPU.
>
>
> What do you think, please?

And rcutorture thinks well of this one, also. ;-)

Thank you both!!!

Thanx, Paul

> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index 10ec29bc0135..78f27ca15922 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -173,6 +173,7 @@ extern void printk_prefer_direct_enter(void);
> extern void printk_prefer_direct_exit(void);
>
> extern bool pr_flush(int timeout_ms, bool reset_on_progress);
> +extern void try_block_console_kthreads(int timeout_ms);
>
> /*
> * Please don't use printk_ratelimit(), because it shares ratelimiting state
> diff --git a/kernel/panic.c b/kernel/panic.c
> index a3c758dba15a..4cf13c37bd08 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -297,6 +297,7 @@ void panic(const char *fmt, ...)
> * unfortunately means it may not be hardened to work in a
> * panic situation.
> */
> + try_block_console_kthreads(10000);
> smp_send_stop();
> } else {
> /*
> @@ -304,6 +305,7 @@ void panic(const char *fmt, ...)
> * kmsg_dump, we will need architecture dependent extra
> * works in addition to stopping other CPUs.
> */
> + try_block_console_kthreads(10000);
> crash_smp_send_stop();
> }
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index ea3dd55709e7..0bcd4f5cf2fc 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -250,6 +250,8 @@ static atomic_t console_kthreads_active = ATOMIC_INIT(0);
> #define console_kthread_printing_exit() \
> atomic_dec(&console_kthreads_active)
>
> +static bool block_console_kthreads;
> +
> /*
> * Helper macros to handle lockdep when locking/unlocking console_sem. We use
> * macros instead of functions so that _RET_IP_ contains useful information.
> @@ -3686,6 +3688,24 @@ bool pr_flush(int timeout_ms, bool reset_on_progress)
> }
> EXPORT_SYMBOL(pr_flush);
>
> +void try_block_console_kthreads(int timeout_ms)
> +{
> + block_console_kthreads = true;
> +
> + while (timeout_ms > 0) {
> + if (console_trylock()) {
> + console_unlock();
> + return;
> + }
> +
> + udelay(1000);
> + timeout_ms -= 1;
> + }
> +
> + /* Failed to block console kthreads. Let them do the job. */
> + block_console_kthreads = false;
> +}
> +
> static void __printk_fallback_preferred_direct(void)
> {
> printk_prefer_direct_enter();
> @@ -3729,7 +3749,8 @@ static bool printer_should_wake(struct console *con, u64 seq)
> return true;
>
> if (con->blocked ||
> - console_kthreads_atomically_blocked()) {
> + console_kthreads_atomically_blocked() ||
> + block_console_kthreads) {
> return false;
> }
>
> diff --git a/kernel/reboot.c b/kernel/reboot.c
> index a091145ee710..222bdd076cd0 100644
> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -270,6 +270,7 @@ static void kernel_shutdown_prepare(enum system_states state)
> blocking_notifier_call_chain(&reboot_notifier_list,
> (state == SYSTEM_HALT) ? SYS_HALT : SYS_POWER_OFF, NULL);
> system_state = state;
> + try_block_console_kthreads(10000);
> usermodehelper_disable();
> device_shutdown();
> }