Re: [PATCH rcu v2] 4/5] rcu-tasks: Move RCU Tasks self-tests to core_initcall()

From: Paul E. McKenney
Date: Wed Feb 05 2025 - 15:32:36 EST


On Wed, Feb 05, 2025 at 12:10:53PM -0800, Paul E. McKenney wrote:
> On Wed, Feb 05, 2025 at 09:00:07PM +0106, John Ogness wrote:
> > On 2025-02-05, "Paul E. McKenney" <paulmck@xxxxxxxxxx> wrote:
> > >> This is caused by RCU falling behind a callback-flooding kthread that
> > >> invokes call_rcu() in a semi-tight loop. Setting rcutree.kthread_prio=40
> > >> avoids the splat, but still gets the shutdown-time hang. Retrying with
> > >> the default rcutree.kthread_prio=2 failed to reproduce the splat, but
> > >> it did reproduce the shutdown-time hang.
> > >>
> > >> OK, maybe printk buffers are not being flushed? A 100-millisecond sleep
> > >> at the end of of rcu_torture_cleanup() got all of rcutorture's output
> > >> flushed, but lost the subsequent shutdown-time console traffic. The
> > >> pr_flush(HZ/10,1) seems more sensible, but this is private to printk().
> > >>
> > >> I would like to log the shutdown-time console traffic because RCU can
> > >> sometimes break things on that path.
> >
> > pr_flush() was changed to private because there were no users. It would
> > not be a problem to make it available. Adding a pr_flush() to
> > rcu_torture_cleanup() would be an appropriate workaround for now (more
> > on this at the end).
>
> Ah, got it.
>
> > > There is a call to kmsg_dump(KMSG_DUMP_SHUTDOWN) in kernel_power_off()
> > > that appears to be intended to dump out the printk() buffers,
> >
> > It only dumps the buffers to the registered kmsg_dumpers. It is not
> > responsible for flushing console backlogs.
>
> That would explain its not doing much for me. ;-)
>
> > > but it
> > > does not seem to do so in kernels built with CONFIG_PREEMPT_RT=y.
> > > Does there need to be a pr_flush() call prior to the call to
> > > migrate_to_reboot_cpu()? Or maybe even to do_kernel_power_off_prepare()
> > > or kernel_shutdown_prepare()?
> >
> > With CONFIG_PREEMPT_RT=y, legacy consoles only print via a dedicated
> > kthread. Without a pr_flush() somewhere, there is basically no chance
> > that they will get backlogs flushed because noone is waitig for them.
> >
> > The new console API (NBCON) provides support for "atomic consoles",
> > which _do_ flush by transitioning to synchronous printing during
> > shutdown/reboot. Unfortunately we still don't have any NBCON atomic
> > console implemented in the kernel. The 8250 UART will be our first
> > driver, most likely available in 6.15. (With the current PREEMPT_RT
> > patch applied, the 8250 NBCON atomic driver is used.)
> >
> > Since only CONFIG_PREEMPT_RT=y has this issue, I am not sure if we want
> > to sprinkle pr_flush() calls on all sleepable shutdown/reboot paths,
> > although that is certainly one way to handle it. For your case, adding a
> > pr_flush() to rcu_torture_cleanup() and making pr_flush() non-private
> > would be an easy solution to avoid your problem.
>
> Would it make sense to put a call to pr_flush() at the beginning of the
> kernel_power_off() function? I suspect that would take care of much
> more than just rcutorture.
>
> Or maybe after the pr_emerg("Power down\n"), since that is normally
> the last thing that shows up on rcutorture's console logs.
>
> I will give that a shot and see what happens. ;-)

And that does the trick! I specified 500ms, but maybe -1 would be
better? You tell me! ;-)

Thanx, Paul

------------------------------------------------------------------------

commit 172b692a79ec5801eab57cb55d2f9c398f244c62
Author: Paul E. McKenney <paulmck@xxxxxxxxxx>
Date: Wed Feb 5 12:27:23 2025 -0800

printk: Flush console log from kernel_power_off()

Kernels built with CONFIG_PREEMPT_RT=y can lose significant console output
and shutdown time, which hides shutdown-time RCU issues from rcutorture.
Therefore, make pr_flush() public and invoke it after then last print
in kernel_power_off().

Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
Cc: Petr Mladek <pmladek@xxxxxxxx>
Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
Cc: John Ogness <john.ogness@xxxxxxxxxxxxx>
Cc: Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx>

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 4217a9f412b2..b3e88ff1ecc3 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -806,4 +806,6 @@ static inline void print_hex_dump_debug(const char *prefix_str, int prefix_type,
#define print_hex_dump_bytes(prefix_str, prefix_type, buf, len) \
print_hex_dump_debug(prefix_str, prefix_type, 16, 1, buf, len, true)

+
+bool pr_flush(int timeout_ms, bool reset_on_progress);
#endif
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 80910bc3470c..e0c6e5a99897 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2461,7 +2461,7 @@ asmlinkage __visible int _printk(const char *fmt, ...)
}
EXPORT_SYMBOL(_printk);

-static bool pr_flush(int timeout_ms, bool reset_on_progress);
+bool pr_flush(int timeout_ms, bool reset_on_progress);
static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress);

#else /* CONFIG_PRINTK */
@@ -4466,7 +4466,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
* Context: Process context. May sleep while acquiring console lock.
* Return: true if all usable printers are caught up.
*/
-static bool pr_flush(int timeout_ms, bool reset_on_progress)
+bool pr_flush(int timeout_ms, bool reset_on_progress)
{
return __pr_flush(NULL, timeout_ms, reset_on_progress);
}
diff --git a/kernel/reboot.c b/kernel/reboot.c
index a701000bab34..3448e6ae3556 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -704,6 +704,7 @@ void kernel_power_off(void)
migrate_to_reboot_cpu();
syscore_shutdown();
pr_emerg("Power down\n");
+ pr_flush(500, 1);
kmsg_dump(KMSG_DUMP_SHUTDOWN);
machine_power_off();
}