Re: [PATCH printk v3 11/14] printk: ringbuffer: Consider committed as finalized in panic

From: John Ogness
Date: Mon Feb 05 2024 - 09:08:53 EST


On 2024-02-01, Petr Mladek <pmladek@xxxxxxxx> wrote:
> On Thu 2023-12-14 22:47:58, John Ogness wrote:
>> A descriptor in the committed state means the record does not yet
>> exist for the reader. However, for the panic CPU, committed
>> records should be handled as finalized records since they contain
>> message data in a consistent state and may contain additional
>> hints as to the cause of the panic.
>>
>> Add an exception for records in the commit state to not be
>> considered non-existing when reading from the panic CPU.
>
> IMHO, it is important to describe effects of this change in more
> details. And I think that it actually does not work as expected,
> see below.

I reviewed my notes from our meeting in Richmond. We had agreed that
this feature should not apply to the latest message. That would change
the commit message to be as follows:

printk: ringbuffer: Consider committed as finalized in panic

A descriptor in the committed state means the record does not yet
exist for the reader. However, for the panic CPU, committed
records should be handled as finalized records since they contain
message data in a consistent state and may contain additional
hints as to the cause of the panic.

The only exception is the last record. The panic CPU may be
usig LOG_CONT and the individual pieces should not be printed
separately.

Add a special-case check for records in the commit state to not
be considered non-existing when reading from the panic CPU and
it is not the last record.

>> --- a/kernel/printk/printk_ringbuffer.c
>> +++ b/kernel/printk/printk_ringbuffer.c
>> @@ -1875,16 +1877,25 @@ static int desc_read_finalized_seq(struct prb_desc_ring *desc_ring,
>>
>> /*
>> * An unexpected @id (desc_miss) or @seq mismatch means the record
>> - * does not exist. A descriptor in the reserved or committed state
>> - * means the record does not yet exist for the reader.
>> + * does not exist. A descriptor in the reserved state means the
>> + * record does not yet exist for the reader.
>> */
>> if (d_state == desc_miss ||
>> d_state == desc_reserved ||
>> - d_state == desc_committed ||
>> s != seq) {
>> return -EINVAL;
>> }
>>
>> + /*
>> + * A descriptor in the committed state means the record does not yet
>> + * exist for the reader. However, for the panic CPU, committed
>> + * records are also handled as finalized records since they contain
>> + * message data in a consistent state and may contain additional
>> + * hints as to the cause of the panic.
>> + */
>> + if (d_state == desc_committed && !this_cpu_in_panic())
>> + return -EINVAL;

And this code would change to:

+ /*
+ * A descriptor in the committed state means the record does not yet
+ * exist for the reader. However, for the panic CPU, committed
+ * records are also handled as finalized records since they contain
+ * message data in a consistent state and may contain additional
+ * hints as to the cause of the panic. The only exception is the
+ * last record, which may still be appended by the panic CPU and so
+ * is not available to the panic CPU for reading.
+ */
+ if (d_state == desc_committed &&
+ (!this_cpu_in_panic() || id == atomic_long_read(&desc_ring->head_id))) {
+ return -EINVAL;
+ }

> If I get it correctly, this causes that panic CPU would see a
> non-finalized continuous line as finalized. And it would flush
> the existing piece to consoles.
>
> The problem is that pr_cont() would append the message into
> the same record. But the consoles would already wait
> for the next record. They would miss the appended pieces.

Exactly. That is why we said that the last message would not be
available. Maybe this new version is acceptable.

> Honestly, I think that it is not worth the effort. It would add
> another complexity to the memory barriers. The real effect is not easy
> to understand. And the benefit is minimal from my POV.

I am OK with dropping this patch from the series. It is questionable how
valuable a LOG_CONT piece from a non-panic CPU is anyway. And if the
non-panic CPU managed to reopen the record, it would be skipped anyway.

I will drop this patch unless you want to keep the new version.

John