Re: [RFC][PATCHv2 1/4] panic: avoid deadlocks in re-entrant console drivers

From: Sergey Senozhatsky
Date: Thu Oct 25 2018 - 05:32:03 EST

On (10/25/18 11:06), Petr Mladek wrote:
> IMHO, the custom s390 implementation can get removed.
> The generic code should do the same job these days.


> > And console_unblank() is not guaranteed to print anything (unlike
> > console_flush_on_panic(), but oops is not panic() yet, so we can't
> > replace it with flush_on_panic()) - console_sem can be locked, so
> > console_unblank() would do nothing.
> I see. I missed that console_unblank() returns early when
> down_trylock_console_sem() fails.
> I still would like to refactor the code somehow to avoid
> the bust_spinlocks(0)/bust_spinlocks(1) ping-pong.
> It might make sense to call console_unblank() from
> console_flush_on_panic().
> I wonder if it would make sense to call unblank_screen() in
> console_unblank()...

These are interesting thoughts.

I can add one more thing to the list.
bust_spinlock() is probably not doing enough. It says that it

"clears any spinlocks which would prevent oops, die(), BUG()
and panic() information from reaching the user."

But this is, technically, not true. Because bust_spinlock() does not remove
console_sem out of sight. And we do have several spinlocks behind it. E.g.
semaphore's ->lock. Both down() and down_trylock() do
raw_spin_lock_irqsave(&sem->lock, flags), so if we got NMI panic while one
of the CPUs was under raw_spin_lock_irqsave(&sem->lock, flags) trying to
lock the console_sem then we are done. And IIRC we had exactly this type
of a bug report from LG 1 or 2 years ago - deadlock on sem->lock.