Re: [PATCH printk v3 05/19] printk: Fail pr_flush() if before SYSTEM_SCHEDULING

From: John Ogness
Date: Fri Jul 26 2024 - 10:45:58 EST


On 2024-07-26, Petr Mladek <pmladek@xxxxxxxx> wrote:
> On Mon 2024-07-22 19:25:25, John Ogness wrote:
>> A follow-up change adds pr_flush() to console unregistration.
>> However, with boot consoles unregistration can happen very
>> early if there are also regular consoles registering as well.
>> In this case the pr_flush() is not important because the
>> regular console manually flushes the boot consoles before
>> unregistering them.
>
> It is not much clear what the "manual flush" means. Is it
> the console_flush_all() in get_init_console_seq()?
>
> I would write something like:
>
> <proposal>
> In this case the pr_flush() is not important because all consoles
> are flushed when checking the initial console sequence number.
> </proposal>

Yes, clearer. Thanks.

>> Allow pr_flush() to fail if @system_state has not yet reached
>> SYSTEM_SCHEDULING. This avoids might_sleep() and msleep()
>> explosions that would otherwise occur.
>
> What do you mean with the explosion, please?
> Does it add some warnings into the log?
> Does it cause extra delays?

I am certain that I could trigger a splat with might_sleep() using some
configured preemption mode. But now I cannot reproduce it. However, with
msleep() it is easy:

[ 0.436739][ T0] printk: legacy console [ttyS0] enabled
[ 0.439820][ T0] printk: legacy bootconsole [earlyser0] disabled
[ 0.446822][ T0] BUG: scheduling while atomic: swapper/0/0/0x00000002
[ 0.450491][ T0] 1 lock held by swapper/0/0:
[ 0.457897][ T0] #0: ffffffff82ae5f88 (console_mutex){+.+.}-{4:4}, at: console_list_lock+0x20/0x70
[ 0.463141][ T0] Modules linked in:
[ 0.465307][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.10.0-rc1+ #372
[ 0.469394][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 0.474402][ T0] Call Trace:
[ 0.476246][ T0] <TASK>
[ 0.481473][ T0] dump_stack_lvl+0x93/0xb0
[ 0.483949][ T0] dump_stack+0x10/0x20
[ 0.486256][ T0] __schedule_bug+0x68/0x90
[ 0.488753][ T0] __schedule+0xb9b/0xd80
[ 0.491179][ T0] ? lock_release+0xb5/0x270
[ 0.493732][ T0] schedule+0x43/0x170
[ 0.495998][ T0] schedule_timeout+0xc5/0x1e0
[ 0.498634][ T0] ? __pfx_process_timeout+0x10/0x10
[ 0.501522][ T0] ? msleep+0x13/0x50
[ 0.503728][ T0] msleep+0x3c/0x50
[ 0.505847][ T0] __pr_flush.constprop.0.isra.0+0x56/0x500
[ 0.509050][ T0] ? _printk+0x58/0x80
[ 0.511332][ T0] ? lock_is_held_type+0x9c/0x110
[ 0.514106][ T0] unregister_console_locked+0xe1/0x450
[ 0.517144][ T0] register_console+0x509/0x620
[ 0.519827][ T0] ? __pfx_univ8250_console_init+0x10/0x10
[ 0.523042][ T0] univ8250_console_init+0x24/0x40
[ 0.525845][ T0] console_init+0x43/0x210
[ 0.528280][ T0] start_kernel+0x493/0x980
[ 0.530773][ T0] x86_64_start_reservations+0x18/0x30
[ 0.533755][ T0] x86_64_start_kernel+0xae/0xc0
[ 0.536473][ T0] common_startup_64+0x12c/0x138
[ 0.539210][ T0] </TASK>

And then the kernel goes into an infinite loop complaining about:

1. releasing a pinned lock
2. unpinning an unpinned lock
3. bad: scheduling from the idle thread!
4. goto 1

John