Re: [PATCH] printk/panic: Add option allow non panic cpus logging to ringbuffer
From: Petr Mladek
Date: Mon Mar 17 2025 - 09:45:44 EST
Hi,
first, I am sorry for the late review. I have been snowed under many
other tasks.
Second, the patch looks fine. I just would like to do few cosmetic
improvements.
Let's start with the Subject. It has few small grammatical mistakes.
I suggest something like:
"printk/panic: Add option to allow non-panic CPUs to write
to the ring buffer."
On Wed 2025-03-05 13:40:46, Donghyeok Choe wrote:
> commit 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing
> to ringbuffer") disabled non-panic CPUs to further write messages to
> ringbuffer after panicked.
>
> commit bcc954c6caba ("printk/panic: Allow cpu backtraces to
> be written into ringbuffer during panic") allows non-panicked CPUs
> to write backtrace only.
>
> Since that, there was a problem with not being able to check the
> important logs of the non-panicked CPUs.
>
> Fix the issue by adding debug option(printk_debug_non_panic_cpus) to
> output the non-paniced CPUs' message.
I would slightly rewrite it. I took inspiration
in the first version of this patch, see
https://lore.kernel.org/r/20250226031628.GB592457@tiffany
and asked "Gemini" to help:
<proposal>
Commit 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing to ringbuffer")
aimed to isolate panic-related messages. However, when panic() itself
malfunctions, messages from non-panic CPUs become crucial for debugging.
While commit bcc954c6caba ("printk/panic: Allow cpu backtraces to
be written into ringbuffer during panic") enables non-panic CPU
backtraces, it may not provide sufficient diagnostic information.
Introduce the "printk_debug_non_panic_cpus" command-line option,
enabling non-panic CPU messages to be stored in the ring buffer during
a panic. This also prevents discarding non-finalized messages from
non-panic CPUs during console flushing, providing a more comprehensive
view of system state during critical failures.
</proposal>
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2375,6 +2375,19 @@ void printk_legacy_allow_panic_sync(void)
> }
> }
>
> +bool __read_mostly printk_debug_non_panic_cpus;
> +
> +#ifdef CONFIG_PRINTK_CALLER
> +static int __init printk_debug_non_panic_cpus_setup(char *str)
> +{
> + printk_debug_non_panic_cpus = true;
> + pr_info("printk: keep printk all cpu in panic.\n");
I would update the message:
pr_info("printk: allow messages from non-panic CPUs in panic()\n");
> +
> + return 0;
> +}
> +early_param("printk_debug_non_panic_cpus", printk_debug_non_panic_cpus_setup);
> +#endif
> +
> asmlinkage int vprintk_emit(int facility, int level,
> const struct dev_printk_info *dev_info,
> const char *fmt, va_list args)
> @@ -2391,7 +2404,9 @@ asmlinkage int vprintk_emit(int facility, int level,
> * non-panic CPUs are generating any messages, they will be
> * silently dropped.
> */
> - if (other_cpu_in_panic() && !panic_triggering_all_cpu_backtrace)
> + if (!printk_debug_non_panic_cpus &&
> + !panic_triggering_all_cpu_backtrace &&
> + other_cpu_in_panic())
I would switch the order:
if (other_cpu_in_panic() &&
!printk_debug_non_panic_cpus &&
!panic_triggering_all_cpu_backtrace)
IMHO, it is more logical. Also both "!printk_debug_non_panic_cpus"
and "!panic_triggering_all_cpu_backtrace" are always true by default.
> return 0;
>
> printk_get_console_flush_type(&ft);
> diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
> index 88e8f3a61922..ffab5f6c1855 100644
> --- a/kernel/printk/printk_ringbuffer.c
> +++ b/kernel/printk/printk_ringbuffer.c
> @@ -2143,7 +2143,9 @@ static bool _prb_read_valid(struct printk_ringbuffer *rb, u64 *seq,
> * But it would have the sequence number returned
> * by "prb_next_reserve_seq() - 1".
> */
> - if (this_cpu_in_panic() && ((*seq + 1) < prb_next_reserve_seq(rb)))
> + if (this_cpu_in_panic() &&
> + (!printk_debug_non_panic_cpus || legacy_allow_panic_sync) &&
> + ((*seq + 1) < prb_next_reserve_seq(rb)))
> (*seq)++;
> else
> return false;
The indendantation does not help much to understand where the if
(condition) ends. Also I would udate the comment. I suggest something
like:
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -2133,9 +2133,9 @@ static bool _prb_read_valid(struct printk_ringbuffer *rb, u64 *seq,
* there may be other finalized records beyond that
* need to be printed for a panic situation. If this
* is the panic CPU, skip this
- * non-existent/non-finalized record unless it is
- * at or beyond the head, in which case it is not
- * possible to continue.
+ * non-existent/non-finalized record unless non-panic
+ * CPUs are still running and their debugging is
+ * explicitly enabled.
*
* Note that new messages printed on panic CPU are
* finalized when we are here. The only exception
@@ -2143,10 +2143,13 @@ static bool _prb_read_valid(struct printk_ringbuffer *rb, u64 *seq,
* But it would have the sequence number returned
* by "prb_next_reserve_seq() - 1".
*/
- if (this_cpu_in_panic() && ((*seq + 1) < prb_next_reserve_seq(rb)))
+ if (this_cpu_in_panic() &&
+ (!printk_debug_non_panic_cpus || legacy_allow_panic_sync) &&
+ ((*seq + 1) < prb_next_reserve_seq(rb))) {
(*seq)++;
- else
+ } else {
return false;
+ }
}
}
With the above changes:
Reviewed-by: Petr Mladek <pmladek@xxxxxxxx>
Best Regards,
Petr