Re: [External] Re: [PATCH 2/2] sched: mark PRINTK_DEFERRED_CONTEXT_MASK in __schedule()

From: Petr Mladek
Date: Tue Sep 29 2020 - 10:27:55 EST


On Mon 2020-09-28 12:25:59, Peter Zijlstra wrote:
> On Mon, Sep 28, 2020 at 06:04:23PM +0800, Chengming Zhou wrote:
>
> > Well, you are lucky. So it's a problem in our printk implementation.
>
> Not lucky; I just kicked it in the groin really hard:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/experimental
>
> > The deadlock path is:
> >
> > printk
> >   vprintk_emit
> >     console_unlock
> >       vt_console_print
> >         hide_cursor
> >           bit_cursor
> >             soft_cursor
> >               queue_work_on
> >                 __queue_work
> >                   try_to_wake_up
> >                     _raw_spin_lock
> >                       native_queued_spin_lock_slowpath
> >
> > Looks like it's introduced by this commit:
> >
> > eaa434defaca1781fb2932c685289b610aeb8b4b
> >
> > "drm/fb-helper: Add fb_deferred_io support"
>
> Oh gawd, yeah, all the !serial consoles are utter batshit.
>
> Please look at John's last printk rewrite, IIRC it farms all that off to
> a kernel thread instead of doing it from the printk() caller's context.
>
> I'm not sure where he hides his latests patches, but I'm sure he'll be
> more than happy to tell you.

AFAIK, John is just working on updating the patchset so that it will
be based on the lockless ringbuffer that is finally in the queue
for-5.10.

Upstreaming the console handling will be the next big step. I am sure
that there will be long discussion about it. But there might be
few things that would help removing printk_deferred().

1. Messages will be printed on consoles by dedicated kthreads. It will
be safe context. No deadlocks.

2. The registration and unregistration of consoles should not longer
be handled by console_lock (semaphore). It should be possible to
call most consoles without a sleeping lock. It should remove all
these deadlocks between printk() and scheduler().

There might be problems with some consoles. For example, tty would
most likely still need a sleeping lock because it is using the console
semaphore also internally.


3. We will try harder to get the messages out immediately during
panic().


It would take some time until the above reaches upstream. But it seems
to be the right way to go.


About printk_deferred():

It is a whack a mole game. It is easy to miss printk() that might
eventually cause the deadlock.

printk deferred context is more safe. But it is still a what a mole
game. The kthreads will do the same job for sure.

Finally, the deadlock happens "only" when someone is waiting on
console_lock() in parallel. Otherwise, the waitqueue for the semaphore
is empty and scheduler is not called.

It means that there is quite a big change to see the WARN(). It might
be even bigger than with printk_deferred() because WARN() in scheduler
means that the scheduler is big troubles. Nobody guarantees that
the deferred messages will get handled later.

Best Regards,
Petr