Re: [PATCH v2] printk: Improve memory usage logging during boot

From: John Ogness
Date: Wed Dec 04 2024 - 12:55:27 EST


On 2024-12-03, Petr Mladek <pmladek@xxxxxxxx> wrote:
> On Sun 2024-12-01 12:40:13, Geert Uytterhoeven wrote:
>> Isn't this kernel log message bookkeeping becoming a bit too excessive?
>> E.g. on a small system:
>>
>> printk: log buffer data + meta data: 65536 + 204800 = 270336 bytes
>>
>> Why is the meta data that big (> 3*log buffer)?
>>
>> #define PRB_AVGBITS 5 /* 32 character average length */
>>
>> unsigned int descs_count = log_buf_len >> PRB_AVGBITS;
>> meta_data_size = descs_count * (sizeof(struct prb_desc) +
>> sizeof(struct printk_info));
>>
>> struct prb_data_blk_lpos {
>> unsigned long begin;
>> unsigned long next;
>> };
>>
>> struct prb_desc {
>> atomic_long_t state_var;
>> struct prb_data_blk_lpos text_blk_lpos;
>> };
>>
>> i.e. 12 bytes on 32-bit, 24 bytes on 64-bit.
>
> I am afraid that we could not do much about the size of this part.
> All the variables are important parts of the lockless machinery.

The descriptor array is one source of wasted space. It ensures there are
enough descriptors so that the text_ring can be fully maximized for an
average record text length of 32. However, looking at
/sys/kernel/debug/printk/index/vmlinux and running some basic tests, it
seems the average text length is >=45 (usually around 55). That means at
least 30% of the descriptor space is not in use, which is roughtly 5% of
the total memory used by all ringbuffer components.

For example, for CONFIG_LOG_BUF_SHIFT=13, the amount of wasted desc_ring
space is about 1.8KiB. (The total memory usage of the ringbuffer is
36KiB.)

However, if we bump the expected average size to 64, there will not be
enough descriptors, leading to wasted text_ring buffer space. (The
expected size must be a power of 2 due to the ID wrapping algorithm.)

>> #define PRINTK_INFO_SUBSYSTEM_LEN 16
>> #define PRINTK_INFO_DEVICE_LEN 48
>>
>> struct dev_printk_info {
>> char subsystem[PRINTK_INFO_SUBSYSTEM_LEN];
>> char device[PRINTK_INFO_DEVICE_LEN];
>> };
>
> This is probably the lowest hanging fruit.

Yes, the info array is also a source of wasted space. This was known at
the time when we discussed [0] introducing this array. Many records do
not even use the dev_printk_info fields. (A running system seems to use
them more often than a booting system.) However, the 64 bytes is
currently quite large. During some testing I typically saw 70% of the
dev_printk_info text space for valid descriptors unused.

I typically saw dev_printk_info averages of:

- 5 bytes for SUBSYSTEM
- 12 bytes for DEVICE

However, this problem is compounded by the descriptor array issue I
mentioned at the beginning. The reason is that there is a 1:1
relationship between descriptors and dev_printk_info structs. So if 30%
of the descriptors are invalid, then that adds an additional waste of
30% totally unused dev_printk_info structs.

For example, for CONFIG_LOG_BUF_SHIFT=13, that represents a total of
13KiB wasted space in the info array (36% of the total memory usage).

> It should be possible to write these dev_printk-specific metadata into
> the data buffer a more efficient way and only for records created by
> dev_printk().
>
> It would require adding "dict_len" into "struct printk_info"
> and reserving space for both "text_len" and "dict_len".
>
> We bundled it into the metadata because these are metadata.
> We wanted to keep the design as clean as possible. We focused
> mainly on the stability and maintainability of the code.
> And it was really challenging to get it all working.

I think keeping it bundled in the meta data is correct. But if those
text arrays could be allocated from a text ring, then the space could be
used efficiently.

I am not recommending that we add the dict_ring back. It was
considerably more complex. But perhaps we could use the text_ring for
these allocations as well. It may even have the benefit that the
"average text size" becomes >=64, which would also help with the first
wasted item I mentioned.

>> struct printk_info {
>> u64 seq; /* sequence number */
>
> I do not recal the details. But I think that we need to
> explicitely store the 64-bit "seq" number in the metadata
> because of the lockless algoritm. It helps to solve
> problems with wrapping of the counter in
> "atomic_long_t state_var".

Yes. I could not figure out a way to safely update a @log_first_seq to
represent the first sequence available in the ringbuffer. (The
complexity involves both writers and readers seeing appropriate sequence
values.) If that could be done somehow, that would save another 2KiB
(for CONFIG_LOG_BUF_SHIFT=13).

In summary...

I think the only quick fix we could do immediately is reduce
PRINTK_INFO_DEVICE_LEN. On my various test machines, I never encountered
anything above 25. Perhaps reducing it from 48 to 32? That would
immediately reclaim 11% (4KiB for CONFIG_LOG_BUF_SHIFT=13) and it would
not require any changes to the various crash/dump tools.

In the longer term we may want to consider using the text ring for
additional device/subsystem string allocation. This would only require
changes to crash/dump tools that provide the device/subsystem
information.

In addition, if low-memory single CPU systems are am important target,
we might be better off implementing a
CONFIG_PRINTK_RINGBUFFER_TINY. Implementing a lockless ringbuffer for a
single CPU is trivial (in comparison) and would not require all the
management abstractions. If it used the same printk_ringbuffer API it
could be a simple drop-in replacement.

John Ogness

[0] https://lore.kernel.org/lkml/20200904124530.GB20558@alley