Re: [PATCH v2] printk: Improve memory usage logging during boot

From: Petr Mladek
Date: Tue Dec 03 2024 - 04:45:57 EST


On Sun 2024-12-01 12:40:13, Geert Uytterhoeven wrote:
> Hi Petr et al,
>
> CC linux-embedded
>
> On Tue, Oct 1, 2024 at 5:55 PM Petr Mladek <pmladek@xxxxxxxx> wrote:
> > On Mon 2024-09-30 11:48:24, Isaac J. Manjarres wrote:
> > > When the initial printk ring buffer size is updated, setup_log_buf()
> > > allocates a new ring buffer, as well as a set of meta-data structures
> > > for the new ring buffer. The function also emits the new size of the
> > > ring buffer, but not the size of the meta-data structures.
> > >
> > > This makes it difficult to assess how changing the log buffer size
> > > impacts memory usage during boot.
> > >
> > > For instance, increasing the ring buffer size from 512 KB to 1 MB
> > > through the command line yields an increase of 2304 KB in reserved
> > > memory at boot, while the only obvious change is the 512 KB
> > > difference in the ring buffer sizes:
> > >
> > > log_buf_len=512K:
> > >
> > > printk: log_buf_len: 524288 bytes
> > > Memory: ... (... 733252K reserved ...)
> > >
> > > log_buf_len=1M:
> > >
> > > printk: log_buf_len: 1048576 bytes
> > > Memory: ... (... 735556K reserved ...)
> > >
> > > This is because of how the size of the meta-data structures scale with
> > > the size of the ring buffer.
> > >
> > > Even when there aren't changes to the printk ring buffer size (i.e. the
> > > initial size == 1 << CONFIG_LOG_BUF_SHIFT), it is impossible to tell
> > > how much memory is consumed by the printk ring buffer during boot.
> > >
> > > Therefore, unconditionally log the sizes of the printk ring buffer
> > > and its meta-data structures, so that it's easier to understand
> > > how changing the log buffer size (either through the command line or
> > > by changing CONFIG_LOG_BUF_SHIFT) affects boot time memory usage.
> > >
> > > With the new logs, it is much easier to see exactly why the memory
> > > increased by 2304 KB:
> > >
> > > log_buf_len=512K:
> >
> > printk: log buffer data + meta data: 524288 + 1835008 = 2359296 bytes
> >
> > > log_buf_len=1M:
> >
> > printk: log buffer data + meta data: 1048576 + 3670016 = 4718592 bytes
>
> Thanks, this is very useful information!
>
> Isn't this kernel log message bookkeeping becoming a bit too excessive?
> E.g. on a small system:
>
> printk: log buffer data + meta data: 65536 + 204800 = 270336 bytes
>
> Why is the meta data that big (> 3*log buffer)?
>
> #define PRB_AVGBITS 5 /* 32 character average length */
>
> unsigned int descs_count = log_buf_len >> PRB_AVGBITS;
> meta_data_size = descs_count * (sizeof(struct prb_desc) +
> sizeof(struct printk_info));
>
> struct prb_data_blk_lpos {
> unsigned long begin;
> unsigned long next;
> };
>
> struct prb_desc {
> atomic_long_t state_var;
> struct prb_data_blk_lpos text_blk_lpos;
> };

I am afraid that we could not do much about the size of this part.
All the variables are important parts of the lockless machinery.

> i.e. 12 bytes on 32-bit, 24 bytes on 64-bit.
>
> #define PRINTK_INFO_SUBSYSTEM_LEN 16
> #define PRINTK_INFO_DEVICE_LEN 48
>
> struct dev_printk_info {
> char subsystem[PRINTK_INFO_SUBSYSTEM_LEN];
> char device[PRINTK_INFO_DEVICE_LEN];
> };

This is probably the lowest hanging fruit. It should be possible
to write these dev_printk-specific metadata into the data buffer
a more efficient way and only for records created by dev_printk().

It would require adding "dict_len" into "struct printk_info"
and reserving space for both "text_len" and "dict_len".

We bundled it into the metadata because these are metadata.
We wanted to keep the design as clean as possible. We focused
mainly on the stability and maintainability of the code.
And it was really challenging to get it all working.

> struct printk_info {
> u64 seq; /* sequence number */

I do not recal the details. But I think that we need to
explicitely store the 64-bit "seq" number in the metadata
because of the lockless algoritm. It helps to solve
problems with wrapping of the counter in
"atomic_long_t state_var".

It was not stored before. The following global values were
enough when the log buffer was synchronized by "logbuf_lock"
spin lock:

static u64 log_first_seq;
static u64 log_next_seq;

> u64 ts_nsec; /* timestamp in nanoseconds */
> u16 text_len; /* length of text message */
> u8 facility; /* syslog facility */
> u8 flags:5; /* internal record flags */
> u8 level:3; /* syslog level */
> u32 caller_id; /* thread id or processor id */

These metadata used to be stored in the "data" buffer next to the
message. Here is the declaration from v6.9:

struct printk_log {
u64 ts_nsec; /* timestamp in nanoseconds */
u16 len; /* length of entire record */
u16 text_len; /* length of text buffer */
u16 dict_len; /* length of dictionary buffer */
u8 facility; /* syslog facility */
u8 flags:5; /* internal record flags */
u8 level:3; /* syslog level */
#ifdef CONFIG_PRINTK_CALLER
u32 caller_id; /* thread id or processor id */
#endif
}

> struct dev_printk_info dev_info;

As I wrote above. It should be possible to store these metadata more
effectively in the data buffer.

But note that it is only about the internal kernel code. These change
would require also updating external tools, for example "crash" tool.

> };
>
> for a whopping 88 bytes. So that totals to 100 bytes per entry
> on 32-bit, and 112 on 64-bit, for an average of 32 characters per
> printed message...

It would be interesting to know how much if these are wasted because
either struct dev_printk_info is empty or the entries are shorter.

Best Regards,
Petr