Re: [PATCH 1/1] pstore/ram: Check member of buffers during the initialization phase of the pstore

From: yunlong xing
Date: Thu Aug 03 2023 - 06:43:28 EST


On Wed, Aug 2, 2023 at 11:47 PM Guilherme G. Piccoli
<gpiccoli@xxxxxxxxxx> wrote:
>
> On 01/08/2023 03:04, Yunlong Xing wrote:
> > [...]
>
> > diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
> > index 85aaf0fc6d7d..eb6df190d752 100644
> > --- a/fs/pstore/ram_core.c
> > +++ b/fs/pstore/ram_core.c
> > @@ -519,7 +519,7 @@ static int persistent_ram_post_init(struct persistent_ram_zone *prz, u32 sig,
> > sig ^= PERSISTENT_RAM_SIG;
> >
> > if (prz->buffer->sig == sig) {
> > - if (buffer_size(prz) == 0) {
> > + if (buffer_size(prz) == 0 && buffer_start(prz) == 0) {
> > pr_debug("found existing empty buffer\n");
>
> Thanks for the patch! I'd also adjust the above print statement to
> reflect the different paths (empty buffers vs illegal one) and maybe
> bump it to pr_info, or even pr_warn(_once?).

I don't think it's necessary.
Through testing, the probability of this situation occurring is still very low.
Using 20 devices, reboot every 10 seconds, and only once after 96 hours.
The panic stack information is as follows:

case A:
sysdump_panic_event+0x3b4/0x5b8
atomic_notifier_call_chain+0x54/0x90
panic+0x1c8/0x42c
die+0x29c/0x2a8
die_kernel_fault+0x68/0x78
__do_kernel_fault+0x1c4/0x1e0
do_bad_area+0x40/0x100
do_translation_fault+0x68/0x80
do_mem_abort+0x68/0xf8
el1_da+0x1c/0xc0
__raw_writeb+0x38/0x174
__memcpy_toio+0x40/0xac
persistent_ram_update+0x44/0x12c
persistent_ram_write+0x1a8/0x1b8
ramoops_pstore_write+0x198/0x1e8
pstore_console_write+0x94/0xe0
console_unlock+0x3a4/0x5d8
register_console+0x3ac/0x488
pstore_register+0x1f8/0x204
ramoops_probe+0x460/0x508
platform_drv_probe+0xb0/0xf4
really_probe+0x1c8/0x7d0
driver_probe_device+0xc0/0x140
__device_attach_driver+0x10c/0x200
bus_for_each_drv+0xa8/0x11c
__device_attach.llvm.1835537715725466631+0xf0/0x19c
bus_probe_device+0x40/0xe0
device_add+0x964/0xb8c
of_device_add+0x40/0x54
of_platform_device_create_pdata+0xb8/0x140
of_platform_default_populate_init+0x5c/0xd4

case B
sysdump_panic_event+0x720/0xd38
atomic_notifier_call_chain+0x58/0xc0
panic+0x1c4/0x6e4
die+0x3c0/0x428
bug_handler+0x4c/0x9c
brk_handler+0x98/0x14c
do_debug_exception+0x114/0x2ec
el1_dbg+0x18/0xbc
usercopy_abort+0x90/0x94
__check_object_size+0x17c/0x2c4
persistent_ram_update_user+0x50/0x220
persistent_ram_write_user+0x354/0x428
ramoops_pstore_write_user+0x34/0x50
write_pmsg+0x14c/0x26c
do_iter_write+0x1cc/0x2cc
vfs_writev+0xf4/0x168
do_writev+0xa4/0x200
__arm64_sys_writev+0x20/0x2c
el0_svc_common+0xc8/0x22c
el0_svc_handler+0x1c/0x28
el0_svc+0x8/0x100
>
> What do you all think, makes sense or could we pollute dmesg too much?
> Cheers!