Re: [PATCH printk v1 11/13] printk: reimplement console_lock for proper kthread support

From: Petr Mladek
Date: Fri Mar 11 2022 - 13:42:03 EST


On Fri 2022-03-11 14:34:40, John Ogness wrote:
> On 2022-03-11, Petr Mladek <pmladek@xxxxxxxx> wrote:
> >> console_unlock()
> >> {
> >> [...]
> >> if (may_schedule)
> >> retry = console_lock_reacquire();
> >> else
> >> retry = console_trylock();
> >> }
> >>
>
> [...]
>
> > OK, it means that the main problem here _is not_ the scheduling context,
> > console_lock() vs. console_trylock(). The main problem _is_ the direct
> > printing vs. the offload to kthreads.
> >
> > Of course, the context is important. It affects how we could re-take
> > the lock. But the main problem is the printing mode. We must make sure
> > that:
> >
> > 1. someone is printing pending messages when the direct mode is needed
>
> console_trylock() causes difficulties here because it will fail if any
> kthread is active. It is an example of direct mode failure. But is that
> really any different than current mainline console_trylock() failing
> because a console_lock() context is active (and possibly not scheduled
> on a CPU)?

I see. Well, console_lock() does not mean that we are in direct mode.
Yes, it stops kthreads. But it might be called by register_console()
or tty driver. console_unlock() might do nothing when
allow_direct_printing() returns false in console_flush_all().


> > 2. kthreads are woken and can enter the printing mode when the direct
> > mode is disabled.
>
> This happens at the end of vprintk_emit() and within __console_unlock(),
> regardless if the printk() was running in direct mode or not.
>
> > Will console_lock_reacquire() really help here?
> >
> > The API theoretically helps in direct mode when the lock was taken
> > via console_lock().
>
> console_lock_reacquire() only exists for the console_lock() case.
>
> > But it does not help when the lock was taken
> > via console_trylock() from printk(). It might mean that
> > the forward progress might not be guaranteed in the direct mode
> > (early boot, panic, ...).
>
> How is the console_trylock() case different from current mainline now?
> As I mentioned above, the kthreads can block console_trylock(), but so
> can a console_lock() currently in mainline.

It is very different. Failing console_trylock() in the current mainline
means that someone else is responsible for printing all the pending
messages. All should be on the safe side.

Failing console_trylock() in this patch means that some printk
kthread is active but nothing else. The kthread might handle one
message and then go into sleep because someone requested the
direct mode using atomic_inc(). Other kthreads might be sleeping
and ignore everything. The failing console_trylock() will mean
that nobody will handle the rest of the messages. The race
is described in the other reply.

Honestly, handling console_trylock() case looks more important to me
because it is used by printk(). It is the main usecase. While
console_lock() code path should be used rarely, during
console registration. Well, it is used also by tty code but
I am not sure how often it is really called there.

> > Hmm, the forward progress seems to be guaranteed in the direct
> > mode most of the time. console_trylock() can take over
> > the atomic counter because console kthreads are not allowed
> > to enter the printing mode in this case.
> >
> > I used "most of the time" because there might be races when
> > the mode is switched. "printk_direct" is an atomic variable.
> > CON_DIRECT is set under con->mutex but console_trylock()
> > does not take the mutex...
>
> You are mixing issues here. If CON_DIRECT is set, there is already a
> console_lock() in progress, so console_trylock() fails on @console_sem.

Yes, I am sorry for the confusion. I forgot the meaning of the flag.
I thought that it was set when the direct mode was requested.
But it is not true. It only means that console_lock is taken
and it could do the direct printing.

It would make sense to rename it. It might be part of all the
confusion here. The name creates false feeling about that
the direct mode is requested. I think that I have actually
suggested renaming the flag in an earlier mail.

Best Regards,
Petr