Re: [Intel-gfx] [PATCH] RFC: console: hack up console_lock more v2

From: Petr Mladek
Date: Thu May 09 2019 - 11:10:05 EST

Next message: Ferdinand Blomqvist: "Re: [RFC PATCH 0/7] rslib: RS decoder is severely broken"
Previous message: Liang, Kan: "Re: [PATCH 22/22] perf/x86/intel/rapl: rename internal variables in response to multi-die/pkg support"
In reply to: Daniel Vetter: "Re: [Intel-gfx] [PATCH] RFC: console: hack up console_lock more v2"
Next in thread: Chris Wilson: "Re: [Intel-gfx] [PATCH] RFC: console: hack up console_lock more v2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed 2019-05-08 10:17:12, Daniel Vetter wrote:
> On Mon, May 06, 2019 at 01:24:48PM +0200, Petr Mladek wrote:
> > On Mon 2019-05-06 11:38:13, Daniel Vetter wrote:
> > > On Mon, May 06, 2019 at 10:26:28AM +0200, Petr Mladek wrote:
> > > > On Mon 2019-05-06 10:16:14, Petr Mladek wrote:
> > > > > On Mon 2019-05-06 09:45:53, Daniel Vetter wrote:
> > > > > > console_trylock, called from within printk, can be called from pretty
> > > > > > much anywhere. Including try_to_wake_up. Note that this isn't common,
> > > > > > usually the box is in pretty bad shape at that point already. But it
> > > > > > really doesn't help when then lockdep jumps in and spams the logs,
> > > > > > potentially obscuring the real backtrace we're really interested in.
> > > > > > One case I've seen (slightly simplified backtrace):
> > > > > >
> > > > > > Call Trace:
> > > > > > <IRQ>
> > > > > > console_trylock+0xe/0x60
> > > > > > vprintk_emit+0xf1/0x320
> > > > > > printk+0x4d/0x69
> > > > > > __warn_printk+0x46/0x90
> > > > > > native_smp_send_reschedule+0x2f/0x40
> > > > > > check_preempt_curr+0x81/0xa0
> > > > > > ttwu_do_wakeup+0x14/0x220
> > > > > > try_to_wake_up+0x218/0x5f0
> > > > >
> > > > > try_to_wake_up() takes p->pi_lock. It could deadlock because it
> > > > > can get called recursively from printk_safe_up().
> > > > >
> > > > > And there are more locks taken from try_to_wake_up(), for example,
> > > > > __task_rq_lock() taken from ttwu_remote().
> > > > >
> > > > > IMHO, the most reliable solution would be do call the entire
> > > > > up_console_sem() from printk deferred context. We could assign
> > > > > few bytes for this context in the per-CPU printk_deferred
> > > > > variable.
> > > >
> > > > Ah, I was too fast and did the same mistake. This won't help because
> > > > it would still call try_to_wake_up() recursively.
> > >
> > > Uh :-/
> > >
> > > > We need to call all printk's that can be called under locks
> > > > taken in try_to_wake_up() path in printk deferred context.
> > > > Unfortunately it is whack a mole approach.
> > >
> > > Hm since it's whack-a-mole anyway, what about converting the WARN_ON into
> > > a prinkt_deferred, like all the other scheduler related code? Feels a
> > > notch more consistent to me than leaking the printk_context into areas it
> > > wasn't really meant built for. Scheduler code already fully subscribed to
> > > the whack-a-mole approach after all.
> >
> > I am not sure how exactly you mean the conversion.
> >
> > Anyway, we do not want to use printk_deferred() treewide. It reduces
> > the chance that the messages reach consoles. Scheduler is an
> > exception because of the possible deadlocks.
> >
> > A solution would be to define WARN_ON_DEFERRED() that would
> > call normal WARN_ON() in printk deferred context and
> > use in scheduler.
>
> Sent it out, and then Sergey pointed out printk_safe_enter/exit (which I
> guess is what you meant, and which I missed)

No, I meant introducing a deferred printk context that would behave
like printk_deferred().

printk safe context is more limiting. It prevents deadlock on
logbuf_lock by temporary saving the messages into per-CPU
buffers.

In scheduler, we could store the messages directly into
the main log buffer. We just need to deffer the console
handling to avoid deadlock on the scheduler locks.

> , but we're doing this already around the up() call
> in __up_console_sem.
>
> So I think these further recursions you're pointed out are already handled
> correctly, and all we need to do is to break the loop involving
> semaphore.lock of the console_lock semaphore only. Which I think this
> patch here achieves.

printk safe context would help only when try_to_wake_up()
and all other functions using the same locks _all over
the system_ are called printk safe (or deferred) context.

By other words, printk safe context solves only printk()
recursion. It does not solve recursion of the scheduler
operations.

Best Regards,
Petr

Next message: Ferdinand Blomqvist: "Re: [RFC PATCH 0/7] rslib: RS decoder is severely broken"
Previous message: Liang, Kan: "Re: [PATCH 22/22] perf/x86/intel/rapl: rename internal variables in response to multi-die/pkg support"
In reply to: Daniel Vetter: "Re: [Intel-gfx] [PATCH] RFC: console: hack up console_lock more v2"
Next in thread: Chris Wilson: "Re: [Intel-gfx] [PATCH] RFC: console: hack up console_lock more v2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]