Re: [RFC][PATCHv4 6/6] printk: remove zap_locks() function

From: Petr Mladek
Date: Thu Dec 01 2016 - 08:36:26 EST


On Thu 2016-12-01 06:42:29, Peter Zijlstra wrote:
> On Thu, Dec 01, 2016 at 11:34:42AM +0900, Sergey Senozhatsky wrote:
> > On (11/25/16 16:17), Peter Zijlstra wrote:
> > > On Fri, Nov 25, 2016 at 04:01:13PM +0100, Petr Mladek wrote:
> > > > On Fri 2016-10-28 00:49:33, Sergey Senozhatsky wrote:
> > > > > 2) Since commit cf9b1106c81c ("printk/nmi: flush NMI messages on the
> > > > > system panic") panic attempts to zap the `logbuf_lock' spin_lock to
> > > > > successfully flush nmi messages to `logbuf'.
> > > >
> > > > Note that the same code is newly used to flush also the printk_safe
> > > > per-CPU buffers. It means that logbuf_lock is zapped also when
> > > > flushing these new buffers.
> > > >
> > >
> > > Note that (raw_)spin_lock_init() as done here and in
> > > printk_nmi_flush_on_panic() can wreck the lock state and doesn't ensure
> > > a subsequent spin_lock() of said lock will actually work.
> > >
> > > The very best solution is to simply ignore the lock in panic situations
> > > rather than trying to wreck it.
> >
> > do you mean that we can enterily drop the spin_lock_init()? or is there
> > something else?
>
> You should not touch the lock in any way shape or form in the panic
> path. Just ignore all locking and do the console writes (which gets you
> into whole different pile of crap).

And this is my fear. I am not sure if the other crap is better than
the current one.

One crazy idea. A compromise might be to switch into a timelimed locking
in the panic mode when there are still more CPUs active. If a spin
lock is not available within X thousands of cycles, there is probably
a deadlock and we should just enter the critical section. It would
preserve some reasonable synchronization but it will allow to move
forward.

Another solution would be to use the temporary buffers if the lock
is not available and push it into the main buffer and consoles later
when there is only one CPU running. In this stage, we do not need
to synchronize and could just skip locking as you suggest.


> Put another way, don't do silly things like spin_lock() when you're in a
> hurry to get your panics out.
>
> > spin_lock_init() either does not improve anything or let
> > us to, at least, move the messages from per-CPU buffers to the logbuf.
>
> So spin_lock_init() will completely wreck the lock. And this being the
> recursion path, not a panic path, we could have continued running the
> kernel no problem.

printk_nmi_flush_on_panic() is called from panic(). It means that we
will do this only when the system is really going down. Which is a nice
improvement. The current code zaps the locks during any Oops.

Best Regards,
Petr