Re: [PATCH v1] panic: push panic() messages to the console even from the MCE nmi handler

From: William Roche
Date: Thu Jan 07 2021 - 19:28:37 EST


On 06/01/2021 05:35, Sergey Senozhatsky wrote:
> On (21/01/04 16:15), “William Roche wrote:
> [..]
>> diff --git a/kernel/panic.c b/kernel/panic.c
>> index 332736a..eb90cc0 100644
>> --- a/kernel/panic.c
>> +++ b/kernel/panic.c
>> @@ -166,6 +166,15 @@ static void panic_print_sys_info(void)
>> ftrace_dump(DUMP_ALL);
>> }
>>
>> +/*
>> + * Force flush messages to the console.
>> + */
>> +static void panic_flush_to_console(void)
>> +{
>> + printk_safe_flush_on_panic();
>> + console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>> +}
>
> You must debug_locks_off() as the very first step here. But see below.

Thanks I've missed this point, I will call debug_locks_off() before
the first call to console_flush_on_panic(CONSOLE_FLUSH_PENDING);

>
>> /**
>> * panic - halt the system
>> * @fmt: The text string to print
>> @@ -247,7 +256,7 @@ void panic(const char *fmt, ...)
>> * Bypass the panic_cpu check and call __crash_kexec directly.
>> */
>> if (!_crash_kexec_post_notifiers) {
>> - printk_safe_flush_on_panic();
>> + panic_flush_to_console();
>> __crash_kexec(NULL);
>
> It's dangerous to call console_flush_on_panic() before we stop secondary
> CPUs. console_flush_on_panic() ignores the state console_sem, so if any
> of the secondary is currently printing something on the consoles you'll
> get corrupted messages - we use `static char buffer` for messages we
> push to consoles.

I understand that there is a risk here, and the __crash_exec() call is
not supposed to return when crashexec is ready. This is our last chance
here to push to the console the pending messages.

> Another issue is that with this panic_flush_to_console() call console_sem
> can end up being locked once (by secondary CPU) and unlocked twice (by
> second and panic CPUs) [*]

I agree, this is a risk. But what we see is that even the message
previously submitted by the pr_emerg() call of the panic thread is
currently discarded when __crash_kexec(NULL) is called.
In the case described here, no other message is printed out by a
secondary CPU, but I understand that it could happen, and we
could avoid calling panic_flush_to_console() when kexec_crash isn't
loaded so that we can wait for the SMP stop and leave the current
steps to run untouched until the last printed messages.

>
>> /*
>> @@ -271,9 +280,8 @@ void panic(const char *fmt, ...)
>> */
>> atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>>
>> - /* Call flush even twice. It tries harder with a single online CPU */
>> - printk_safe_flush_on_panic();
>> kmsg_dump(KMSG_DUMP_PANIC);
>> + panic_flush_to_console();
>
> Why?

Here, we are supposed to push the pending messages, as I could verify
that they don't reach the console until the console_unblank(). So I
wanted to push them with panic_flush_to_console() before the possible
upcoming __crash_kexec() call.

Are you suggesting that I should leave the printk_safe_flush_on_panic()
call before the kmsg_dump(KMSG_DUMP_PANIC) call ?
But let's leave that untouched too.

>
>> /*
>> * If you doubt kdump always works fine in any situation,
>> @@ -298,7 +306,7 @@ void panic(const char *fmt, ...)
>> * buffer. Try to acquire the lock then release it regardless of the
>> * result. The release will also print the buffers out. Locks debug
>> * should be disabled to avoid reporting bad unlock balance when
>> - * panic() is not being callled from OOPS.
>> + * panic() is not being called from OOPS.
>> */
>> debug_locks_off();
>> console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>> @@ -314,6 +322,7 @@ void panic(const char *fmt, ...)
>> * We can't use the "normal" timers since we just panicked.
>> */
>> pr_emerg("Rebooting in %d seconds..\n", panic_timeout);
>> + panic_flush_to_console();
>
> [*] So this
>
>> for (i = 0; i < panic_timeout * 1000; i += PANIC_TIMER_STEP) {
>> touch_nmi_watchdog();
>> @@ -347,6 +356,7 @@ void panic(const char *fmt, ...)
>> disabled_wait();
>> #endif
>> pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
>> + panic_flush_to_console();
>
> [*] and this are both very interesting points.
>
> Those extra console_flush_on_panic() calls indicate that normal printk()
> cannot succeed in locking the console_sem so it doesn't try to
> console_unlock(): either because we killed the secondary CPU while it
> was holding the lock, or because we locked it once and unlocked it twice.
>
> I think it would make sense to just re-init console_sem, so that normal
> printk()-s will have chance to grab the console_sem lock and then we don't
> need any extra panic_flush_to_console() calls. Maybe we can do something
> like this
>
> ---
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index ffdd0dc7ec6d..4bd2e29c8cc0 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2638,6 +2638,7 @@ void console_flush_on_panic(enum con_flush_mode mode)
> * context and we don't want to get preempted while flushing,
> * ensure may_schedule is cleared.
> */
> + sema_init(&console_sem, 1);
> console_trylock();
> console_may_schedule = 0;
>
>

I've tried your suggestion, and even with this sema_init() call, the
pr_emerg("Rebooting in %d seconds..\n", panic_timeout);
message is not printed to the console without a subsequent call to
panic_flush_to_console().

See below a code modification that would illustrate a possible change
including the above comments. Would something like that be more
appropriate ?

This is the end of this running kernel, and panic is called from the NMI
context where printk() messages have to be pushed. The mechanism I use
seems to work in this case, but there could be another way to achieve
this goal (?)

Thanks again for your help,
William.


--- kernel/panic.c 2021-01-07 18:20:01.992582942 -0500
***************
*** 166,171 ****
--- 166,180 ----
ftrace_dump(DUMP_ALL);
}

+ /*
+ * Force flush messages to the console.
+ */
+ static void panic_flush_to_console(void)
+ {
+ printk_safe_flush_on_panic();
+ console_flush_on_panic(CONSOLE_FLUSH_PENDING);
+ }
+
/**
* panic - halt the system
* @fmt: The text string to print
***************
*** 247,253 ****
* Bypass the panic_cpu check and call __crash_kexec directly.
*/
if (!_crash_kexec_post_notifiers) {
! printk_safe_flush_on_panic();
__crash_kexec(NULL);

/*
--- 256,265 ----
* Bypass the panic_cpu check and call __crash_kexec directly.
*/
if (!_crash_kexec_post_notifiers) {
! if (kexec_crash_loaded()) {
! debug_locks_off();
! panic_flush_to_console();
! }
__crash_kexec(NULL);

/*
***************
*** 284,291 ****
*
* Bypass the panic_cpu check and call __crash_kexec directly.
*/
! if (_crash_kexec_post_notifiers)
__crash_kexec(NULL);

#ifdef CONFIG_VT
unblank_screen();
--- 296,308 ----
*
* Bypass the panic_cpu check and call __crash_kexec directly.
*/
! if (_crash_kexec_post_notifiers) {
! if (kexec_crash_loaded()) {
! debug_locks_off();
! panic_flush_to_console();
! }
__crash_kexec(NULL);
+ }

#ifdef CONFIG_VT
unblank_screen();
***************
*** 298,304 ****
* buffer. Try to acquire the lock then release it regardless of the
* result. The release will also print the buffers out. Locks debug
* should be disabled to avoid reporting bad unlock balance when
! * panic() is not being callled from OOPS.
*/
debug_locks_off();
console_flush_on_panic(CONSOLE_FLUSH_PENDING);
--- 315,321 ----
* buffer. Try to acquire the lock then release it regardless of the
* result. The release will also print the buffers out. Locks debug
* should be disabled to avoid reporting bad unlock balance when
! * panic() is not being called from OOPS.
*/
debug_locks_off();
console_flush_on_panic(CONSOLE_FLUSH_PENDING);
***************
*** 314,319 ****
--- 331,337 ----
* We can't use the "normal" timers since we just panicked.
*/
pr_emerg("Rebooting in %d seconds..\n", panic_timeout);
+ panic_flush_to_console();

for (i = 0; i < panic_timeout * 1000; i += PANIC_TIMER_STEP) {
touch_nmi_watchdog();
***************
*** 347,352 ****
--- 365,371 ----
disabled_wait();
#endif
pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
+ panic_flush_to_console();

/* Do not scroll important messages printed above */
suppress_printk = 1;