Re: POC: Alternative solution: Re: [PATCH 0/4] printk: reimplement LOG_CONT handling

From: John Ogness
Date: Thu Aug 13 2020 - 03:44:30 EST


On 2020-08-13, Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx> wrote:
> This is not an unseen pattern, I'm afraid. And the problem here can
> be more general:
>
> pr_info("text");
> pr_cont("1");
> exception/IRQ/NMI
> pr_alert("text\n");
> pr_cont("2");
> pr_cont("\n");
>
> I guess the solution would be to store "last log_level" in task_struct
> and get current (new) timestamp for broken cont line?

(Warning: new ideas ahead)

The fundamental problem is that there is no real association between
the cont parts. So any interruption results in a broken record. If we
really want to do this correctly, we need real association.

With the new finalize flag for records, I thought about perhaps adding
support for chaining data blocks.

A data block currently stores an unsigned long for the ID of the
associated descriptor. But it could optionally include a second unsigned
long, which is the lpos of the next text part. All the data blocks of a
chain would point back to the same descriptor. The descriptor would only
point to the first data block of the chain and include a flag that it is
using chained data blocks.

Then we would only need to track the sequence number of the open record
and new data blocks could be added to the data block chain of the
correct record. Readers cannot see the record until it is finalized.

Also, since only finalized records can be invalidated, there are no
races of chains becoming invalidated while being appended.

My concerns about this idea:

- What if the printk user does not correctly terminate the cont message?
There is no mechanism to allow that open record to be force-finalized
so that readers can read newer records.

- For tasks, the sequence number of the open record could be stored on
the task_struct. For non-tasks, we could use a global per-cpu variable
where each CPU stores 2 sequence numbers: the sequence number of the
open record for the non-task and the sequence number of the open
record for an interrupting NMI. Is that sufficient?

John Ogness