Re: [PATCH printk v3 11/14] printk: ringbuffer: Consider committed as finalized in panic

From: Petr Mladek
Date: Thu Feb 01 2024 - 13:00:39 EST


On Thu 2023-12-14 22:47:58, John Ogness wrote:
> A descriptor in the committed state means the record does not yet
> exist for the reader. However, for the panic CPU, committed
> records should be handled as finalized records since they contain
> message data in a consistent state and may contain additional
> hints as to the cause of the panic.
>
> Add an exception for records in the commit state to not be
> considered non-existing when reading from the panic CPU.

IMHO, it is important to describe effects of this change in more
details. And I think that it actually does not work as expected,
see below.

> --- a/kernel/printk/printk_ringbuffer.c
> +++ b/kernel/printk/printk_ringbuffer.c
> @@ -1875,16 +1877,25 @@ static int desc_read_finalized_seq(struct prb_desc_ring *desc_ring,
>
> /*
> * An unexpected @id (desc_miss) or @seq mismatch means the record
> - * does not exist. A descriptor in the reserved or committed state
> - * means the record does not yet exist for the reader.
> + * does not exist. A descriptor in the reserved state means the
> + * record does not yet exist for the reader.
> */
> if (d_state == desc_miss ||
> d_state == desc_reserved ||
> - d_state == desc_committed ||
> s != seq) {
> return -EINVAL;
> }
>
> + /*
> + * A descriptor in the committed state means the record does not yet
> + * exist for the reader. However, for the panic CPU, committed
> + * records are also handled as finalized records since they contain
> + * message data in a consistent state and may contain additional
> + * hints as to the cause of the panic.
> + */
> + if (d_state == desc_committed && !this_cpu_in_panic())
> + return -EINVAL;

If I get it correctly, this causes that panic CPU would see a
non-finalized continuous line as finalized. And it would flush
the existing piece to consoles.

The problem is that pr_cont() would append the message into
the same record. But the consoles would already wait
for the next record. They would miss the appended pieces.

It might be fixed by finalizing the record before we read
the non-finalized piece. It is doable but it would add
another lock-less scenario. I am not sure if it would work with
the existing barriers or how complicated it would be to
prove the correctness.

Now, in practice, it would allow to flush pieces of continuous lines
printed on panic CPU immediately. It would not affect messages printed
by other CPUs because of a mix of reasons:

1. The current code tries hard to move the console owner to panic
CPU. It allows the panic CPU to flush the messages a safe way
even when other CPUs are stopped or somehow locked.

It means that the consoles are ideally flushed when the panic CPU
adds a message.

2. Only the last record might be in a committed state. Any older
record is automatically finalized when it reaches the committed
state.

3. The previous patch causes that messages from non-panic CPUs
are skipped when they are not committed or finalized, see
https://lore.kernel.org/all/20231214214201.499426-11-john.ogness@xxxxxxxxxxxxx


Now, this patch would really help only when the panic CPU dies in
the middle of a continuous message or when the final message does
not have a newline character.

Honestly, I think that it is not worth the effort. It would add another
complexity to the memory barriers. The real effect is not easy
to understand. And the benefit is minimal from my POV.


Alternative solutions:

1. Flush the non-finalized continuous line only before reaching
the infinite loop in panic(). printk() gets disabled
at this point anyway.

It would solve only this one scenario, though.


2. Always finalize messages in vprintk_store() when in_panic().

It would have the desired effect in all panic() situations.
And it would not add another complexity to the lock-less
ringbuffer code.


Best Regards,
Petr