Re: pstore/ram: printk: NULL characters in pstore ramoops area

From: Kees Cook
Date: Fri Aug 04 2023 - 04:00:01 EST


On Thu, Aug 03, 2023 at 04:34:09PM -0700, Vijay Balakrishna wrote:
> Hello,
>
> We are noticing NULL characters in ramoops/pstore memory after a warm or a
> kexec reboot [1] in our 5.10 ARM64 product kernel after moving from 5.4
> kernel.  I ruled out fs/pstore/* as the source from where NULL characters
> originate by adding debug code [2] and confirming from collected output
> [3].  Then isolated further to printk log/ring buffer area, the NULL
> characters were already present in buffer in kmsg_dump_get_buffer() when
> kmsg log lines are read.  After looking at printk merges in mainline kernel,
> I cherry-picked following which looked related to our 5.10 kernel and still
> see NULL characters.
>
> 4260e0e5510158d704898603331e5365ebe957de printk: consolidate
> kmsg_dump_get_buffer/syslog_print_all code
> 726b5097701a8d46f5354be780e1a11fc4ca1187 printk: refactor
> kmsg_dump_get_buffer()
> bb07b16c44b2c6ddbafa44bb06454719002e828e printk: limit second loop
> of syslog_print_all

Do you mean that you took a working v5.4 kernel and backported the above
3 commits and it starting showing the %NUL characters?

> [...]
> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
> index ade66dbe5f39..1825972151b2 100644
> --- a/fs/pstore/ram.c
> +++ b/fs/pstore/ram.c
> @@ -383,6 +383,10 @@ static int notrace ramoops_pstore_write(struct
> pstore_record *record)
>      size = record->size;
>      if (size + hlen > prz->buffer_size)
>          size = prz->buffer_size - hlen;
> +    if (null_char(record->buf, size))
> +        pr_crit("%s: A NULL char in record buf, size %zu\n", __func__,
> size);
> +    else
> +        pr_crit("%s: No NULL char in record buf, size %zu\n", __func__,
> size);
>      persistent_ram_write(prz, record->buf, size);
> [...]
> root@localhost:~# reboot
> [ 2188.073362] systemd-shutdown[1]: Could not detach loopback /dev/loop1: Device or resource busy
> [ 2188.082272] systemd-shutdown[1]: Could not detach loopback /dev/loop0: Device or resource busy
> [ 2188.091873] watchdog: watchdog0: watchdog did not stop!
> [ 2188.099227] systemd-shutdown[1]: Failed to finalize loop devices, DM devices, ignoring.
> [ 2188.306671] reboot: Restarting system
> [ 2188.316932] ramoops: ramoops_pstore_write: A NULL char in record buf, size 88190

Well that does seem pretty definitive that it's a problem with the
printk/kmsg infrastructure: the %NULs are present in the buffer being
handed to pstore. :(

I have had a growing suspicion that there is a hard-to-find memory
corruption issue with recent printk work (seen during early-boot UBSAN
reporting), but v5.10 is pretty old, so it's probably not related.

Is the issue present in modern kernels?

-Kees

--
Kees Cook