Re: [PATCH printk v1 06/10] printk: use seqcount_latch for console_seq

From: John Ogness
Date: Thu Aug 05 2021 - 11:26:48 EST


On 2021-08-05, Petr Mladek <pmladek@xxxxxxxx> wrote:
> On Tue 2021-08-03 15:18:57, John Ogness wrote:
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index d07d98c1e846..f8f46d9fba9b 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2912,18 +2920,19 @@ void console_unblank(void)
>> */
>> void console_flush_on_panic(enum con_flush_mode mode)
>> {
>> - /*
>> - * If someone else is holding the console lock, trylock will fail
>> - * and may_schedule may be set. Ignore and proceed to unlock so
>> - * that messages are flushed out. As this can be called from any
>> - * context and we don't want to get preempted while flushing,
>> - * ensure may_schedule is cleared.
>> - */
>> - console_trylock();
>> - console_may_schedule = 0;
>> -
>> - if (mode == CONSOLE_REPLAY_ALL)
>> - console_seq = prb_first_valid_seq(prb);
>> + if (console_trylock()) {
>> + if (mode == CONSOLE_REPLAY_ALL)
>> + latched_seq_write(&console_seq, prb_first_valid_seq(prb));
>
> I am scratching my head about this. Of course, latched_seq_write() does
> not guarantee the result when the console lock it taken by another process.
> But console_lock(), called below, will call latched_seq_write()
> anyway.
>
> Also CONSOLE_REPLAY_ALL is used by panic_print_sys_info().
> It is called the following way:
>
> void panic(const char *fmt, ...)
> {
> [...]
> debug_locks_off();
> console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>
> panic_print_sys_info();
> [...]
> }
>
> On one hand, console_flush_on_panic(CONSOLE_FLUSH_PENDING) will
> most likely take over the console lock even when it was taken
> by another CPU before. And the 2nd console_flush_on_panic()
> called from panic_print_sys_info() will not even notice.
>
> On the other hand, CONSOLE_REPLAY_ALL would not even try to
> reply the log when the console log was not available.
>
> The risk of broken console_seq is neglible. console_unlock()
> should be safe even with invalid console_seq.
>
> My opinion:
>
> I suggest to keep the original logic and maybe add some comment:
>
> void console_flush_on_panic(enum con_flush_mode mode)
> {
> /*
> * If someone else is holding the console lock, trylock will fail
> * and may_schedule may be set. Ignore and proceed to unlock so
> * that messages are flushed out. As this can be called from any
> * context and we don't want to get preempted while flushing,
> * ensure may_schedule is cleared.
> */
> console_trylock();
> console_may_schedule = 0;
>
> /*
> * latched_seq_write() does not guarantee consistent values
> * when console_trylock() failed. But this is the best effort.
> * console_unlock() will update anyway console_seq. prb_read_valid()
> * handles even invalid sequence numbers.
> */
> if (mode == CONSOLE_REPLAY_ALL)
> latched_seq_write(&console_seq, prb_first_valid_seq(prb));
>
> console_unlock();
> }

I see now that CONSOLE_REPLAY_ALL is not handled correctly. And in the
follow-up patch "printk: introduce kernel sync mode" the situation gets
worse. I am trying to find ways to handle things without blindly
ignoring locks and hoping for the best.

I need to re-evaluate how to correctly support this feature.

John Ogness